It’s deployment time!
You’ve done the pre-deployment evals. You THINK your model is safe, so you ship it 🚀
🚨 After deployment, reports of misbehavior start trickling in
What happened?? How could you have caught it?? 🤔
@icmlconf 2026 Spotlight!
🧵
Our paper "Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction" was accepted for publication at #ICML2026 https://t.co/DUTSj4Sz7u
We made LLM inference a lot faster.
1/n
Speculative decoding throws away draft tokens when they don't match. What if you re-weighted your samples instead using importance sampling?
We introduce SMC-SD, sequential Monte Carlo speculative decoding, an approximate sampler that speeds up inference by 2.36× over SOTA speculative decoding, while producing a controllable Pareto frontier of speed and approximation accuracy.
ArXiv: https://t.co/XWY0jKvbyj
minimal repo release: https://t.co/lI53Qu16Mo
Joint work with Mauricio Barba da Costa (@mauricio_b25181), Chi-Chih Chang (@CCCCC1009CCC ), Cameron Freer, Tim Vieira (@xtimv), Ryan Cotterell, and Mohamed Abdelfattah (@mohsaied).
Some more vibe-coding fun - every math major's favorite party trick: the wobbly table theorem as an interactive 3D visualization.
https://t.co/P30394hpFe
Ok, AI-assisted coding is pretty rad. Here's something I built to explore the stuff I've written (blog posts and publications). It's still a bit of work in progress, but it's pretty fun, especially the "semantic" tab + sliders.
I'd love some feedback!
https://t.co/8pAIRCaeWP
@PromptSlinger@nomic_ai Source code: https://t.co/Lvku612xf3
The whole thing is built from a centralized YAML file. So it should be pretty modular in case you want to try it on different content. I have been meaning to try out for exploring the 15 years of PDFs that have accumulated on my hard drive
@srush_nlp I am fine to be in the "didn't provide gains" camp. The implementation is simple, we give code, it's short, and it's online. So that comment seems misplaced.
@srush_nlp@mnoukhov (LOL - did not mean to imply that all papers are careful or that all blog posts are not careful. I do have my issues with this specific blog post. And I have issues when Claude Code's WebFetch commands land 99% of the time on blog posts rather than books and research papers.)
@mnoukhov@srush_nlp k3 with the calibration parameter works fine, but without it's getting in a roller coaster without a harness.
On the other hand, RB just completely kicks ass & has the nonnegativity property that Schulman wanted.
In your settings is there something that keeps k3 from exploding?