I’m a founding member @AsariAILabs and PhD researcher @Caltech @RPI @NEC working in LLM-agents, reasoning, RL, test-time scaling, and computer use agents
Post-training LLMs is like mixing a cocktail:
Too much easy data → no learning
Too much hard data → instability
Wrong balance → collapse
And today, we mix it by hand.
What if the data mixture could be learned instead of hand-tuned?
https://t.co/2EntaqvO2G
🧵👇
Our second continual learning dinner + community gathering is coming up next Thursday evening.
We cutoff the guestlist on Tuesday to give names to security. Sign up before then.
Will be very fun.
https://t.co/uVVMYjDUAu
AI runs on data.
But… data is hard to buy.
❌ How much is a dataset worth?
❌ Will it actually help your model?
❌ What about privacy / trust?
What if we never priced data directly…
and instead priced what actually matters: model improvement?
https://t.co/7L2j7GMS31
🧵👇
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
📣 The May speaker of our researcher meetup on continual learning will be Caltech's @lightetal Jonathan Li! 📣
Jonathan will share recent work on structured, language-native search methods such as DISC and SFS, which enable agents to explore diverse reasoning paths, strategies, and solutions beyond what is directly represented in their training data.
Space is limited and last time we had a waitlist of 200+ so please register soon if you'd like to join.
@sarahookr@mralbertchun@NikzadAfshin
https://t.co/lm1pkfdAk0
AI runs on data.
But… data is hard to buy.
❌ How much is a dataset worth?
❌ Will it actually help your model?
❌ What about privacy / trust?
What if we never priced data directly…
and instead priced what actually matters: model improvement?
https://t.co/7L2j7GMS31
🧵👇
The big takeaway:
👉 Don’t sell data.
👉 Sell what data does.
By pricing model improvement, we can:
• unlock data markets
• align incentives
• make ML systems more accessible
• ensure privacy
This is a step toward real AI economies.
🧵 8/n