Check out our latest work (that @rasbt talks about in much detail in the post below) on "Concise Reasoning via Reinforcement Learning" -
Paper: https://t.co/l2nMOOjA58
Blog: https://t.co/bcV0qkoCZl
As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (https://t.co/UbBv4rzM09) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL loss tends to favor longer responses when the model gets negative rewards, which I think explains the "aha" moments and longer chains of thought that arise from pure RL training.
I.e., if the model gets a negative reward (i.e., the answer is wrong), the math behind PPO causes the average per-token loss becomes smaller when the response is longer. So, the model is indirectly encouraged to make its responses longer. This is true even if those extra tokens don't actually help solve the problem.
What does the response length have to do with the loss? When the reward is negative, longer responses can dilute the penalty per individual token, which results in lower (i.e., better) loss values (even though the model is still getting the answer wrong).
So the model "learns" that longer responses reduce the punishment, even though they are not helping correctness.
In addition, the researchers show that a second round of RL (using just a few problems that are sometimes solvable) can shorten responses while preserving or even improving accuracy. This has big implications for deployment efficiency.
I will be speaking on this panel this afternoon at 12:15 pm PST on technical debt in agentic AI systems and the reality of deployments, especially at enterprise scale.
The conference is free to attend, you can register below:
https://t.co/ZviIo0cPTT
See you at the conference!
A recent trend that I'm noticing with the emergence of LLMs: Engineers are increasingly isolating themselves from talking to outside teams and companies. That vacuum is now filled with enterprising product types who can now code a lot more, and want to build for customers' needs.
Call for ACM SIGAI Autonomous Agents Research Award 2026
The award is made for research excellence in autonomous agents, to recognise researchers whose current work is an important influence
Deadline: 15th Dec 2026
Nominate: https://t.co/B86Niw1NcL
#SIGAIAward
I think this award is tremendous and I am inspired by this year's awardee @shakir_za and the past recipients like Milind Tambe, and Stuart Russell for their efforts to advance AI for humanity. #aaai
What a tour de force and gracious #NeurIPS2025 Test of Time talk by Kaimeng He! 🙏
Faced with a choice of a "prophet talk" or a "realistic talk", he says he went for the latter, and we are all the richer for it.
"I feel like I am in a ship in Atlantic. Everything ahead is unknown. There is no oracle. No prophet. And when it is discovered, I hope it becomes common knowledge."
Wondering how to attend an ML conference the right way?
ahead of NeurIPS 2025 (30k attendees!) here are ten pro tips:
1. Your main goals:
(i) meet people
(ii) regain excitement about work
(iii) learn things
– in that order.
2. Make a list of papers you like and seek them out at poster sessions. Try to talk to the authors– you can learn much more from them than from a PDF.
3. Pick one workshop and one tutorial that sounds most interesting. Skip the rest.
4. Cold email people you want to meet but haven't. Check Twitter and the accepted papers list. PhD students are especially responsive.
5. Practice a concise pitch of unpublished research you're working on for "what are you interested in rn?". Focus on big unanswered questions and exciting new directions, *not* papers.
6. Skip the orals. Posters are a higher-bandwidth, more engaging, more invigorating. Orals are a good time to go for a walk or talk in the hallway.
7. for the love of god, do NOT work on other research in your hotel room. Save mental bandwidth for the conference. (This may seem obvious; you'd be surprised.)
8. Talk to people outside your area. There are many smart people working on niches <10 people understand. Learn about one or two that won't help your own work.
9. Attend one social each night. Don't overthink it or get caught up in status games. They're all fun.
10. Take breaks. You can't go to everything, and conferences consume more energy than a normal workweek.
hope this helps, and sad i'm not attending neurips, have fun :)
🎙️ A new episode of The Information Bottleneck podcast!
This time we're trying something different, just AI news & paper discussions (no guest interview).
We talked about:
🏥 GPT-5 in medicine & healthcare AI risks
📦 Stanford's "Cartridges" paper on compressing KV caches
🔄 Continuous Autoregressive Language Models paper
📚 The Smol Training Playbook
Let us know what you think of this experimental format!
*open app*
"We've just raised a $50M pre-seed to help your toaster talk to your microwave."
"We just raised a $230M pre-pre seed to agenticly agent your AI agents."
"I'm 4 and I just dropped out of preschool to go all-in on AI -enabled candles."
*close app*
first AI came for stackoverflow
and i did not speak out
due to their unpleasant moderators
then AI came for quora
and i did not speak out
because i never use quora
then AI came for Wikipedia
and i did not speak out
because i did not care
then AI came for AI research
and there was no one left
to speak for me