@apocsnap@specs People don’t realise they are not competing with normal glasses, but with Apple Vision Pro. I would wear this at work, but not the Vision Pro.
Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇
🧵1/9 Rex has been selected for an oral presentation at #ICML2026! 🎉
We make diffusion/flow model solvers exactly reversible (bijection), for both ODEs and SDEs
Last project of my Ph.D. w/ Chen Liu @ClarksonUniv → first project of my postdoc at #AITHYRA!
For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.
We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal.
This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
🚨New Optimizer Paper
AMUSE: Anytime MUon with Stable gradient Evaluation
AMUSE combines Muon with Schedule-Free-style gradient evaluation for stable anytime training without LR decay.
• Stronger 124M / 720M / 1B pretraining
• Strong ImageNet / ViT fine-tuning performance.
RWKV-7 G1g is here: the world's best pure RNN LLM, and a competitive LLM in general. Try https://t.co/k0AivnxCwP for bsz16 7B inference. G1h in June 🙂 p.s. const 15000+tps decoding on single 5090: https://t.co/pHP6bXITjQ
🚨 New Paper 🚨
ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models
A few modifications to Schedule-Free Learning make it completely LR tuning free, and allow it to greatly outperform schedules for long duration training!
https://t.co/LzjIIsOlG8
[1/3] Excited to share Winfree Oscillatory Neural Network (WONN): a synchronization-based neural architecture built on Winfree dynamics.
WONN evolves representations through oscillatory synchronization on a toroidal phase space (S^1)^d.
Project page: https://t.co/5dkz6W73jc
Gated DeltaNet has been one of my favorite "hybrid attention" newcomers in the good old transformer stack.
Excited to see Gated DeltaNet-2. Adding it to my reading stack. In the meantime, I have a primer on Gated DeltaNet here: https://t.co/FoicOLtFE6