We wrote a book about representation learning!
It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms.
👷♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions
Very excited to release Terminal-Bench 2.1!
Coding agents are among the most economically consequential deployments of LLMs to date. As agents improve, benchmark reliability matters more.
We audited TB2.0 and found and corrected issues in 28/89 tasks. 30% of the benchmark!
But the rankings survived, absolute scores moved up to 12pp!
@sirbayes@druv_pai@pengwang2003@YiMaTweets Thank you for your thorough reading of and feedback on the first version, Kevin! Many significant improvements to the new version are downstream of it, like a discussion of latent diffusion in Ch6 :-)
This version of the book also involved huge efforts on the infrastructure side, for the .pdf and web versions of the book, in English and Chinese. Thank you to my awesome collaborators: @druv_pai, @robinwuzy, @TianzheC, @YiMaTweets, Peng Wang, @qu_1006, and many more!
We've released an updated "v2.0" of our book on deep representation learning! We've reorganized and improved many sections for better pedagogical clarity, and added many new examples and applications throughout the book. Massive thanks are due to folks in the community who submitted feedback and corrections on the first version, including @sirbayes :-)
📕Read: https://t.co/gLoWpLRicB
🛠️Contribute: https://t.co/Fj0REoP1HZ
I am delighted to see a new version of the book by @_sdbuchanan, @druv_pai , @pengwang2003 and @YiMaTweets . This is the best book on the foundations of deep representation learning! In this era of coding agents, the math is all you need to learn :)
https://t.co/3IvoZeFUYA
@graceluo_@feng_jiahai@trevordarrell@AlecRad@JacobSteinhardt Cool work! I was wondering -- is it essential for the interpretability results that the denoiser is a FiLM SwiGLU MLP? Would a different activation, or a DiT or a U-Net work too? I'm curious if the denoiser is doing something like LISTA (https://t.co/RE72U7FqPD)
We wrote a book about representation learning!
It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms.
👷♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions
Thrilled to have contributed to Terminal-Bench, a benchmark for real-time evaluation of autonomous agents on tasks ranging from debugging system configs to developing protein engineering workflows.
My core contribution focused on analyzing agent behavior: how they reason, where they get stuck, and why they fail.
A consistent finding? Large models tend to break in similar ways. To build better agents, we don't just need better models, we need to innovate the worlds they learn in!
Check out the paper for details. More coming soon.
Escape the tyranny of the KV cache at large context lengths via end-to-end test-time training!
I had the privilege to work with this team at the beginning of last year. The rigor and vision that went into this is remarkable (metalearning a transformer!?) -- check it out!
LLM memory is considered one of the hardest problems in AI.
All we have today are endless hacks and workarounds. But the root solution has always been right in front of us.
Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data.
Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available.
Blog: https://t.co/woCpiIrq0T
Arxiv: https://t.co/3VkFlS3wx3
This has been over a year in the making with @arnuvtandon and an incredible team.
Sharing our recent work on understanding the mechanisms underlying the empirical success of hyperparameter transfer using μP! (1/11)
with Denny Wu and @albertobietti
It's been inspiring to see @brenthyi grow this project over the past three years!!
The best library I know for bootstrapping research code into the terminal with zero friction 🫡
tyro 1.0 is out 🐣
This has been a pet project/niche interest of mine for ~4 years now, so it's a bit of a sentimental moment...
https://t.co/bAibP3RjxE
Excited to be @ #Neurips2025 presenting Weaver, our approach for combining multiple weak verifiers to narrow the generation-verification gap. Come talk to us today from 11:00 AM – 2:00 PM PST Exhibit Hall C,D,E #3714. If you're working on reliable agents, evaluation, or self-verification, this is for you and I would love to connect.