The next wave of AI will not be won by better prompts. It will be won by systems that learn from experience.
Today, Prime Intellect Lab is out of beta, open for you to start training your own models.
The era of self-improving agents is here.
We are so back!
Future looking bright to post-train, serve, and continuously improve your own model on top of models like GLM-5.2 using https://t.co/BfmlaxxJaE ๐ซก
Great RL systems deep dive by @SemiAnalysis_
Scaling RL is as much of an infra problem as an algorithm one
SemiAnalysis ran experiments on our stack: Prime RL + Sandboxes. System efficiency is ultimately queue health to match generator and trainer throughput
been beating this drum since early 2025, seems like people are starting to see why it's so important :)
RL works -> "train or get trained on" -> open models + post-training infra are the path to institutional flywheels + democratization of AI progress
Satya is perfectly describing the why and what behind @primeintellect since 2023 ๐ซก
> AI needs to be open & sovereign
> Let every company create its own self-improving agents: and own their loop to make them better
> A rich open ai ecosystem creates far more abundance than a future locked down by a few closed labs
> Every company is becoming an ai company: so every company needs to own its own product <> model improvement loop
@primeintellect enables this today:
> Your own evals + rl envs for the outcomes you care about
> models self-improving in production from your real traces
> don't cede your moat to a handful of labs. This self-improvement loop is the IP and it compounds
Open self improving agents for everyone ๐ซก
By performing SFT on tool outputs and RL on the assistant tokens, we can efficiently teach the model the environment dynamics. This happens on-policy: the LLM models the environment not in a vacuum but in response to its own actions.
We show strong results in the under-resourced programming language Forth and evaluate generalization to unrelated environments.
We also characterize what aspects of an environment lead to overfitting when using ECHO, how model behavior is impacted, and much more.
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.
this is the biggest wake-up call to protect and nourish open source AI
if you don't build out sovereign and independent models+infra closed labs will patronize you to an insulting degree
This is why Prime Intellect must exist.
We must diffuse the tools of recursive self-improving AI, otherwise Anthropic will build the singleton and concentrate power until they run the world government.