Not even much to say, I think the government way overstepped but we’ll see if they can substantiate the evidence (in which case Anthropic would tell us).
Anthropic’s messaging was pushing government action, but this is insane and a bad action by USG for the AI trajectory.
MAI-Thinking-1 is out!
Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra.
Check out our tech report has the full story of our RL climbs.
https://t.co/aLW40sWz4d
My time at Ai2 / @allen_ai has come to an end.
Ai2 is a wonderful place. The last 2.5+ years building Olmo, Tulu, and other projects will be one of the peaks of my entire career. I'm extremely thankful for my teammates and the open community who made this work possible.
For me, it's time to try something different. I will still be working in the open model & open science spaces (more news on that soon). In the meantime I'll be spending a few months learning, chatting with a broader network, getting married (!!) and most importantly recharging from pouring my soul into this place.
I've attached the note I shared with the team and some fun photos from our time together. I'll keep cheering for Ai2 and am excited to see what you build next.
@icwsm is an amazing conference and an even better communtiy (not just because they named named be a best spc member lol)! Definitely consider submitting next cycle!
A sincere thank you to all reviewers and Senior Program Committee chairs whose dedication makes this conference possible.
We're proud to recognize the very best among them with our Best Reviewer and SPC Awards🏆
For a long time, academic researchers being at the cutting edge of new technologies has been a great social equilibrium. Neutral, unbiased technologists have been the people to spread new ideas to the world.
As AI research takes off in velocity, it is also going behind closed doors. The tech industry has sewed distrust, and now they are the ones trying to tell the world about incredible changes coming. It's a big loss to a form of social contract in America.
There's been a history of scientists helping society understand new technologies. There is a public service in the culture of science that I want to see continue.
It's being exacerbated by feelings of FOMO, especially finically driven, where I'm seeing many people who previously wanted to be professors -- and likely still do deep down -- feel a need to conform and chase money, in a pocket of industry. I get it, I grapple with this.
For those with a safety net, there will be great returns to some who choose to zag, and try to build something good, for people who need something different. For me, this is building interesting, fully-open models, to show what you can do with a variety of open weight sizes.
Yes, AI's immediate future is dictated by the frontier, but it's long-term trajectory still deeply includes academic institutions and open science. Knowledge will always diffuse, but to whom?
As of today, I think China is positioned to be the global home of AI research in a few years. The home of research is where ideas are accessible, spread rapdily, and are nurtured. The U.S. seems to be unwinding many institutions and relationships.
The largest returns go to people who build something differentiated, at least in reputation, and a lot of people are not being shown that this path exists.
I did an interview with the Pittsburgh Post Gazette about my research, my life and my graduation.
Buy the local newspaper this weekend to read it!
I never thought I’d make it to the city/state newspaper, but dreams come true 😀
Now on to pursuing my trod towards faculty (at CMU??)
🎉 Thrilled to have two papers accepted to ACL 2026 main!
1. Graph-based models match LLMs on close-ended human simulation tasks with far less compute & greater transparency
2. (oral) How to allocate human samples towards fine-tuning vs post-hoc rectification in simulation
1/ "New tokenizer" does not imply "new base model," and "new base model" is not the simplest explanation. There are much simpler explanations that fit Anthropic's public description of Opus 4.7 equally well.
Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today in @Nature.
Read the paper: https://t.co/b1BYwcW9dH
Human opinions are complex and diverse. What do LLMs understand about them?
In our new #ICLR paper, we find that LLMs know far more about human opinions than is revealed in their outputs, and develop SAE methods to bring this knowledge to the surface + steer to different groups.
Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released:
- Welcome video
- Lecture 1: Overview of RLHF & Post-training
- Lecture 2: IFT, Reward Models, Rejection Sampling
- Lecture 3: RL Math
- Lecture 4: RL Implementation
I'm going to add question & answer videos throughout the lecture to go deeper on topics that need it, and potentially cover some topics that are too recent and in flux to go in print. I expect 10-15 videos in total over the next few months.
At the same time, development around the code for the book is picking up. It's a great time to build the foundation for post-training methods.
YT playlist and course landing page below.
People are too obsessed with benchmarks for open models. The core determining factor of success often is:
1. Immediate & long term tooling support.
2. Finetunability
Tbh Gemma has struggled here in the past. Qwen has excelled at it. It's where the winners are crowned.
I never thought that I would be this excited about humans taking the first steps to go back to the moon. Seeing it actually happen is so flipping cool!
Google dropped 4 different Gemma open-weight models! I'm most excited that they're finally adopting a standard Apache 2.0 open source license. This'll massively boost adoption. The standard of better licenses was set by mostly Chinese open model labs, and now labs in the U.S. companies are following suit.
The models are really like 31B dense, 26B-4B active MoE, 8B, 5B dense (called smaller for some reason). Base models too. Good sizes for tinkering, some local uses, and research (8/5B). 30B is particularly a great size range for building useful tools (which is why we made Olmo 3 that size too).
Gemini doesn't release bad models so I'm excited to try these!
Congrats Googlers.