Language models are becoming our default interface to facts. Yet their ability to *verify* facts can differ from their ability to *generate* them.
We trace this "generation-verification gap" (GV-gap) across the lifecycle of a fact — w/ @AnjaSurina + @caglarml 🧵
From "System of Record" to "System of Intelligence"
In the next decade, you want to own the system of intelligence that pulls from the system of record, becomes the user’s one-stop shop for gaining context and taking action, and turns the SoR into something that’s primarily consumed at the API layer.
The reasoning layer that sits above the database is where a new generation of companies is being built, and it’s where the majority of the next decade’s enterprise value of GTM software will end up.
Full piece from a16z's Gio Ahern, Steph Zhang, and Alex Immerman: https://t.co/2udG6l6SSx
X has the best information on the internet and the worst incentives & culture.
meet noscroll — the AI that doomscrolls it for you and texts you just the things that matter.
no feed. no brainrot. no ragebait. just signal.
try it for free → https://t.co/XqdExWR13j 🙅🏼♂️
🏆🏆🏆 Thrilled to share that our paper “The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates” received an Honorable Mention Award at @ACM_CSCW 2025 🎉
By analyzing thousands of ICLR peer reviews, we show that papers receiving AI-assisted reviews systematically receive higher scores and are more likely to be accepted.
Thanks to my amazing coauthors for making this possible! @manoelribeiro, @im_td , @VminVsky , @cervisiarius
🚨New paper alert! 🚨
Tandem Training for Language Models
https://t.co/Emzcgf1KHx
Actions & thoughts of AI w/ superhuman skills will be hard for humans to follow, undermining human oversight of AI. We propose a new way to make AI produce human-understandable solutions. How?👉🧵
📣New paper: Rigorous AI agent evaluation is much harder than it seems.
For the last year, we have been working on infrastructure for fair agent evaluations on challenging benchmarks.
Today, we release a paper that condenses our insights from 20,000+ agent rollouts on 9 challenging benchmarks spanning web, coding, science, and customer service tasks.
Our key insight: Benchmark accuracy hides many important details. Take claims of agents' accuracy with a huge grain of salt. 🧵
Introducing RND1, the most powerful base diffusion language model (DLM) to date.
RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture.
We are making it open source, releasing weights, training details, and code to catalyze further research on DLM inference and post-training.
We are researchers and engineers (DeepMind, Meta, Liquid, Stanford) building the engine for recursive self-improvement (RSI) — and using it to accelerate our own work. Our goal is to let AI design AI.
We are hiring.
🌱✨ Life update: I just started my PhD at Princeton University!
I will be supervised by @manoelribeiro and affiliated with @PrincetonCITP.
It's only been a month, but the energy feels amazing —very grateful for such a welcoming community. Excited for what’s ahead! 🚀
Two papers on agent safety got accepted to NeurIPS 2025! 🥳
1) Dynamic Risk Assessments for Offensive Cybersecurity Agents
https://t.co/kLWCDUOxRG
2) Safety Devolution in AI Agents
https://t.co/aqAxX0Fh9D
Selling your stuff sucks.
With @sellwithatext you can sell it with a text. Just send us a photo of what you want to sell and then we'll do the rest. List, negotiate, answer questions, handle delivery.
Launching in bay area for now (text zip code if your outside)!
Rid (@sellwithatext) is making selling easier than buying. Just text them a photo of what you want to sell and they'll do the rest. Finding buyers, negotiating, answering questions, and picking up the item.
Congrats on the launch, @vminvsky & @benediktstroebl!
https://t.co/n6OdMdkc7A
This work got accepted to ACL 2025 main! 🎉
In this updated version, we extended our results to several models and showed they can actually generate good definitions of mean concept representations across languages.🧵