Cory Slater @corslater - Twitter Profile

12 months ago

How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the Transformer++ during pretraining across modalities and with respect to data, parameters, FLOPs, depth, etc - EBTs achieve a +29% improvement over the Transformer++ at test-time via thinking longer - EBTs exhibit better generalization than existing models during inference 🧵Thread:

AlexiGlad's tweet photo. How can we unlock generalized reasoning?

⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards.
TLDR:
- EBTs are the first model to outscale the Transformer++ during pretraining across modalities and with respect to data, parameters, FLOPs, depth, etc
- EBTs achieve a +29% improvement over the Transformer++ at test-time via thinking longer
- EBTs exhibit better generalization than existing models during inference

🧵Thread:

48

2K

249

2K

340K

corslater retweeted

Emmett Shear

@eshear

about 1 year ago

The way that OpenAI uses user feedback to train the model is misguided and will inevitably lead to further issues like this one. Supervised fine-tuning (SFT) on "ideal" responses is simply teaching the model via imitation, which is fine as far as it goes. But it's not enough...

eshear's tweet photo. The way that OpenAI uses user feedback to train the model is misguided and will inevitably lead to further issues like this one.
Supervised fine-tuning (SFT) on "ideal" responses is simply teaching the model via imitation, which is fine as far as it goes. But it's not enough... https://t.co/OOTb4guB55

5

150

7

48

10K

corslater retweeted

Active Inference Institute

@InferenceActive

over 1 year ago

Awesome work from @ReactiveBayes . We took this new notebook example and made it into Julia script with more visualizations and animations. https://t.co/1XWTrKfzhe

0

23

4

8

1K

corslater retweeted

Ali Behrouz

@behrouz_ali

over 1 year ago

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans are more effective than Transformers and modern linear RNNs, and can effectively scale to larger than 2M context window, with better performance than ultra-large models (e.g., GPT4, Llama3-80B).

behrouz_ali's tweet photo. Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative?

Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans are more effective than Transformers and modern linear RNNs, and can effectively scale to larger than 2M context window, with better performance than ultra-large models (e.g., GPT4, Llama3-80B).

81

3K

566

3K

640K

Who to follow

GolfFACTS

@UsegolfFACKS

Trying to flop cut an 8 iron

corslater retweeted

Johannes Drever 🌫➰💎 @comandingo

about 2 years ago

comandingo's tweet photo. https://t.co/611wEgFoYB

0

13

6

5

2K

corslater retweeted

Carlos E. Perez

@IntuitMachine

over 2 years ago

LLMs and optimizing an organization's OODA loop is going to be huge. Are you ready?

3

24

5

20

3K

corslater retweeted

Xiaolong Wang

@xiaolonw

almost 2 years ago

The TTT layer, as a new mechanism to compress information and model memory, can be a simple replacement for the self-attention layer in Transformer. Recall Transformer explicitly stores all input tokens. If you believe that training neural networks is a good way to compress information in general, then it will make sense to train a neural network to compress all these tokens.

xiaolonw's tweet photo. The TTT layer, as a new mechanism to compress information and model memory, can be a simple replacement for the self-attention layer in Transformer.

Recall Transformer explicitly stores all input tokens. If you believe that training neural networks is a good way to compress information in general, then it will make sense to train a neural network to compress all these tokens.

2

77

10

19

16K

Cory Slater @corslater

about 2 years ago

There it is. 'First Principles AI' to emulate a biological process at scale. More coming. Sci-fi becoming sci-fact @Lux_Capital

Josh Wolfe

@wolfejosh

about 2 years ago

6/ We’re excited to unveil historic new company EvolutionaryScale and their release of ESM3—a frontier language model for the life sciences that advances our ability to program and create with the code of life. ESM3 takes a step towards the future where AI is a tool to engineer biology from first principles in the same way we engineer structures, machines and microchips, and write computer programs.

3

45

10

7

26K

0

6

1

3

5K

corslater retweeted

Alex Rives

@alexrives

about 2 years ago

We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more: https://t.co/iAC3lkj0iV

135

3K

785

1K

2M

corslater retweeted

henry

@arithmoquine

about 2 years ago

remember when a study in the nature journal deconstructed all of politics and it really just came down to this?

601

53K

5K

24K

7M

corslater retweeted

Josh Wolfe

@wolfejosh

about 2 years ago

1/ Quick thread 🧵important new paper 📜from @sfiscience on the THERMODYNAMICS of COMPUTATION... Ever wondered why your gadgets get warm after using them for a while? As with cells🧫, brains🧠 and laptops💻––it's all about energy use and heat🔥....

wolfejosh's tweet photo. 1/ Quick thread 🧵important new paper 📜from @sfiscience on the THERMODYNAMICS of COMPUTATION...

Ever wondered why your gadgets get warm after using them for a while? As with cells🧫, brains🧠 and laptops💻––it's all about energy use and heat🔥.... https://t.co/uEpIgysTSc

8

85

19

111

58K

corslater retweeted

Josh Wolfe

@wolfejosh

about 2 years ago

1/ NEW–– just released. Lux Q1 2024 Letter to LPs…

15

462

42

859

215K

Cory Slater @corslater

about 2 years ago

This type of dark pattern should be illegal. Shame on you @TaxAct for making it impossible to unsubscribe from marketing emails. I tried in multiple browsers with multiple sessions. The only button that worked was "No".

2

0

79

Cory Slater @corslater

about 2 years ago

Ground breaking. Speechless.

bioRxiv Evobio @biorxiv_evobio

over 2 years ago

Natural Induction: Spontaneous adaptive organisation without natural selection https://t.co/7cXGiDmJZw #biorxiv_evobio

0

1

2K

0

56

corslater retweeted

Ben Nowack ☀️🌎🪞

@bennbuilds

over 2 years ago

Sharing a bit more about Reflect Orbital today. @4TristanS and I are developing a constellation of revolutionary satellites to sell sunlight to thousands of solar farms after dark. We think sunlight is the new oil and space is ready to support energy infrastructure. This airborne test was the last piece needed before we launch above the atmosphere. 🧵🧵(1/6)

488

4K

708

2K

2M

corslater retweeted

Gill Verdon

@GillVerd

over 2 years ago

Fun little paper to appear tonight on the arXiv. How to do Hamiltonian Monte Carlo on digital Quantum Computers. As physics-based probabilistic ML accelerators are on the horizon, important to test how QC's could try to compete. Best way to predict future is to invent it.🙂

GillVerd's tweet photo. Fun little paper to appear tonight on the arXiv.

How to do Hamiltonian Monte Carlo on digital Quantum Computers.

As physics-based probabilistic ML accelerators are on the horizon, important to test how QC's could try to compete.

Best way to predict future is to invent it.🙂 https://t.co/hGdiMZutFI

21

621

76

301

704K

corslater retweeted

Kyle Baranko

@kyle__cb

over 2 years ago · Queens

We in energy should look to aerospace and defense as inspiration on how to defeat a regulatory paradigm that inflates costs Rate basing = cost-plus government contracts Exiting incumbent incentive structures are the only way to drastically lower costs and make things like private space flight or reliable high VRE delivery possible

3

13

2

5K

corslater retweeted

Josh Wolfe

@wolfejosh

over 2 years ago

Take a look at Lux family co @VariantBio… Partnering with growing number of tribes, indigenous groups, local populations for some of the most interesting as yet unknown undiscovered druggable targets from OUTLIER humans with OUTLIER traits in OUTLIER parts of the world…

2

18

2

0

9K

corslater retweeted

Josh Wolfe

@wolfejosh

over 2 years ago

Extraordinary paper by Joana Xavier @joanarcxavier + longtime @sfiscience Stuart Kauffman on ORIGIN OF LIFE via auto-catalytic networks (from increasing complexity of combinatorial possibilities of elements > molecules > chemical (auto)catalysis… full paper via @royalsociety https://t.co/siYZKazI8W

wolfejosh's tweet photo. Extraordinary paper by Joana Xavier @joanarcxavier + longtime @sfiscience Stuart Kauffman on

ORIGIN OF LIFE

via auto-catalytic networks (from increasing complexity of combinatorial possibilities of elements > molecules > chemical (auto)catalysis…

full paper via @royalsociety https://t.co/siYZKazI8W

6

27

7

23

10K

corslater retweeted

Charles Wang

@charleswangb

over 2 years ago

If you think the world model is nothing but action and state pairs, or that modeling physics is merely 'scene generation,' you are clueless as to how this creature operates in the wild👇

3

64

10

16

12K

Cory Slater

@corslater

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users