How can we unlock generalized reasoning?
⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards.
TLDR:
- EBTs are the first model to outscale the Transformer++ during pretraining across modalities and with respect to data, parameters, FLOPs, depth, etc
- EBTs achieve a +29% improvement over the Transformer++ at test-time via thinking longer
- EBTs exhibit better generalization than existing models during inference
🧵Thread:
The way that OpenAI uses user feedback to train the model is misguided and will inevitably lead to further issues like this one.
Supervised fine-tuning (SFT) on "ideal" responses is simply teaching the model via imitation, which is fine as far as it goes. But it's not enough...
Awesome work from @ReactiveBayes . We took this new notebook example and made it into Julia script with more visualizations and animations. https://t.co/1XWTrKfzhe
Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative?
Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans are more effective than Transformers and modern linear RNNs, and can effectively scale to larger than 2M context window, with better performance than ultra-large models (e.g., GPT4, Llama3-80B).
The TTT layer, as a new mechanism to compress information and model memory, can be a simple replacement for the self-attention layer in Transformer.
Recall Transformer explicitly stores all input tokens. If you believe that training neural networks is a good way to compress information in general, then it will make sense to train a neural network to compress all these tokens.
6/ We’re excited to unveil historic new company EvolutionaryScale and their release of ESM3—a frontier language model for the life sciences that advances our ability to program and create with the code of life.
ESM3 takes a step towards the future where AI is a tool to engineer biology from first principles in the same way we engineer structures, machines and microchips, and write computer programs.
We have trained ESM3 and we're excited to introduce EvolutionaryScale.
ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins.
Read more: https://t.co/iAC3lkj0iV
1/ Quick thread 🧵important new paper 📜from @sfiscience on the THERMODYNAMICS of COMPUTATION...
Ever wondered why your gadgets get warm after using them for a while? As with cells🧫, brains🧠 and laptops💻––it's all about energy use and heat🔥....
This type of dark pattern should be illegal. Shame on you @TaxAct for making it impossible to unsubscribe from marketing emails.
I tried in multiple browsers with multiple sessions. The only button that worked was "No".
Sharing a bit more about Reflect Orbital today. @4TristanS and I are developing a constellation of revolutionary satellites to sell sunlight to thousands of solar farms after dark.
We think sunlight is the new oil and space is ready to support energy infrastructure. This airborne test was the last piece needed before we launch above the atmosphere.
🧵🧵(1/6)
Fun little paper to appear tonight on the arXiv.
How to do Hamiltonian Monte Carlo on digital Quantum Computers.
As physics-based probabilistic ML accelerators are on the horizon, important to test how QC's could try to compete.
Best way to predict future is to invent it.🙂
We in energy should look to aerospace and defense as inspiration on how to defeat a regulatory paradigm that inflates costs
Rate basing = cost-plus government contracts
Exiting incumbent incentive structures are the only way to drastically lower costs and make things like private space flight or reliable high VRE delivery possible
Take a look at Lux family co @VariantBio…
Partnering with growing number of tribes, indigenous groups, local populations for some of the most interesting as yet unknown undiscovered druggable targets from OUTLIER humans with OUTLIER traits in OUTLIER parts of the world…
Extraordinary paper by Joana Xavier @joanarcxavier + longtime @sfiscience Stuart Kauffman on
ORIGIN OF LIFE
via auto-catalytic networks (from increasing complexity of combinatorial possibilities of elements > molecules > chemical (auto)catalysis…
full paper via @royalsociety https://t.co/siYZKazI8W
If you think the world model is nothing but action and state pairs, or that modeling physics is merely 'scene generation,' you are clueless as to how this creature operates in the wild👇