Exciting times at @reflection_ai!
Science moves faster when researchers can inspect, adapt, reproduce, and build. That is why open models matter.
If you want to build models that move science forward, come join us.
Our open models are designed to support the Genesis Mission by giving the scientists in our national labs the flexibility and sovereignty to work on their own terms. Learn more ⤵️
AGI is in its first stages of take-off.
Every country is realizing that AI sovereignty is existential, which requires open models.
We’ve signed a deal with Shinsegae Group to build South Korea’s sovereign cloud on a US open model built by Reflection.
More to come.
Proud to share that Reflection is partnering with Shinsegae Group to build a 250MW AI factory for Korea’s sovereign AI 🇰🇷
Excited to keep pushing the frontiers of RL, reasoning, and open models with this team!
https://t.co/1e09vRFLnX
Excited to announce that, after finishing my PhD a couple of months ago, I will continue to do *open* science at @reflection_ai on @_ghorbani's new team!
And we are still looking for exceptional individuals to join us 😉
Most approaches to “agentic AI” focus on post-training fixes.
In this conversation, member of our technical staff, @achowdhery argues the bottleneck is pre-training itself. Drawing on her work on PaLM and early Gemini, she explains why next-token prediction breaks down for long-horizon planning -- and how objectives, attention, and training data must evolve to support true agentic behavior.
I am deeply grateful to my colleagues at OpenAI. It has been a privilege to be there from the early days of ChatGPT and to learn from so many brilliant people, especially the reasoning team, which has been my home these past few years and a constant source of insight, collaboration, and support.
Thank you for everything we built together. I am excited for what comes next.
Hi friends, after three incredible years at OpenAI I am excited to share that I am starting a new chapter at @reflection_ai, where I will be leading the Science of Scaling team.
Our mission is to deepen the scientific understanding of large scale learning and to turn compute into intelligence as efficiently and predictably as possible.
In Science of Scaling we will focus on three pillars: understanding LLM training dynamics at scale, the role of real and synthetic data, and the science of RL. I am especially excited to pursue this mission together with @MishaLaskin and @real_ioannis at Reflection.
I am building a small, high trust team that cares deeply about open research, careful measurement, and engineering excellence. If you are interested in the science of pretraining, data, and RL at scale and want to help push the frontier with a focused, tight knit group, my DMs are open. I will also be at NeurIPS this week (https://t.co/vRcIBgK3rn).
Generalists are useful, but it’s not enough to be smart.
Advances come from specialists, whether human or machine.
To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data.
We call this Specific Intelligence.
It's what we're building at Applied Compute.
We unlock the latent knowledge inside a company, use it to train custom models, and deploy an in-house agent workforce that reports to your team.
We work with sophisticated companies that have already captured early gains from general models, like @cognition, @DoorDash, and @mercor_ai. They’re pulling even further ahead with proprietary in-house agents that don’t need to wait for the next public model release.
Together, we are building and validating models and agents in days instead of months, achieving state-of-the-art performance on customer evals.
Our team has high density and low latency. Our founders all worked on different parts of this problem while they were researchers at OpenAI — @ypatil125 as a key member on the agentic software engineer effort (Codex), @rhythmrg as a core contributor to the first RL-trained reasoning model (o1), and @lindensli as a core contributor on ML systems and infrastructure for RL training.
Two-thirds of the team are former founders, and everyone brings a deep technical background, from top AI researchers to Math Olympiad winners.
We are backed by $80M in funding from Benchmark, Sequoia, Lux, Elad Gil, Victor Lazarte, Omri Casspi, and others. With their support, we are growing the team, scaling deployments, and bringing to market the first generation of agent workforces built on specific models.
In short:
1. We are building Specific Intelligence for specific work at specific companies.
2. That will power in-house agent workforces to support their human bosses.
3. That in turn will unlock AI’s full potential through humanity’s greatest engine of progress: thriving corporations in a free market.
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
the distance between category leaders and stragglers in frontier AI starts with talent and culture
by the time the revenue and valuation signals show up, it’s too late
🚀 We’re hiring at NVIDIA!
Our team is pushing the frontier of LLM / DLM post-training and system optimization. We are looking for exceptional people with large-scale LLM + systems experience to join us (full time only).
🔹 Focus areas include:
•Post-training of large models
•Systems for LLM/DLM training & inference at scale
•Efficiency, scaling, and evaluation frameworks of LLMs
At NVIDIA, you’ll work with world-class researchers and engineers on cutting-edge foundation models at unprecedented scale.
👉 If you’re passionate about LLMs, systems, and building the next generation of AI, we’d love to hear from you.
📩 If you’re interested, please send me your CV!
@nvidia #LLM #AI #Systems #PostTraining #DeepLearning
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from.
In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit like what you'd see on Stack Overflow / Quora, or etc., but geared towards LLM use cases.
Neither of the two above are going away (imo), but in this era of reinforcement learning, it is now environments. Unlike the above, they give the LLM an opportunity to actually interact - take actions, see outcomes, etc. This means you can hope to do a lot better than statistical expert imitation. And they can be used both for model training and evaluation. But just like before, the core problem now is needing a large, diverse, high quality set of environments, as exercises for the LLM to practice against.
In some ways, I'm reminded of OpenAI's very first project (gym), which was exactly a framework hoping to build a large collection of environments in the same schema, but this was way before LLMs. So the environments were simple academic control tasks of the time, like cartpole, ATARI, etc. The @PrimeIntellect environments hub (and the `verifiers` repo on GitHub) builds the modernized version specifically targeting LLMs, and it's a great effort/idea. I pitched that someone build something like it earlier this year:
https://t.co/ANHhasxzD8
Environments have the property that once the skeleton of the framework is in place, in principle the community / industry can parallelize across many different domains, which is exciting.
Final thought - personally and long-term, I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically. I think that reward functions are super sus, and I think humans don't use RL to learn (maybe they do for some motor tasks etc, but not intellectual problem solving tasks). Humans use different learning paradigms that are significantly more powerful and sample efficient and that haven't been properly invented and scaled yet, though early sketches and ideas exist (as just one example, the idea of "system prompt learning", moving the update to tokens/contexts not weights and optionally distilling to weights as a separate process a bit like sleep does).
Huge congratulations to @AIatMeta and to @shengjia_zhao! Shengjia is one of the most brilliant and kind researchers I’ve had the privilege to work with.
We're excited to have @shengjia_zhao at the helm as Chief Scientist of Meta Superintelligence Labs. Big things are coming! 🚀
See Mark's post: https://t.co/SL7h4sGfwx
To summarize this week:
- we released general purpose computer using agent
- got beaten by a single human in atcoder heuristics competition
- solved 5/6 new IMO problems with natural language proofs
All of those are based on the same single reinforcement learning system