What is mid-training?
The stage between pre-training and post-training
A base model is continued on a smaller, curated data mixture chosen to strengthen capabilities that the original pre-training run undercovered, such as multilinguality, domain knowledge, or long-context extension.
It usually keeps a pre-training-like objective, but uses higher-quality or more targeted data so later instruction tuning, preference tuning, or RL can shape behavior on top of stronger capabilities.
Learn more here: https://t.co/WhpYkyGlv8
Interested in learning how to run RL at scale? Here are the best resources to read…
Research on Scaling RL
1. The Art of Scaling RL compute for LLMs: https://t.co/PGjI6Gwgv0
2. Scaling Behaviors of LLM RL Post-Training: https://t.co/2u2saB3C0h
3. Optimally Scaling Sampling Compute for LLM RL: https://t.co/rUSdUvJyNH
4. Scaling up RL: https://t.co/O8vV6z8ymx
5. ProRL V2 - Prolonged Training Validates RL Scaling Laws: https://t.co/vu72juvRW4
6. Polaris - A Recipe for Scaling RL with Reasoning Models: https://t.co/rMibSAeJbg
RL Frameworks
1. Hybrid Flow (early outline of the verl framework): https://t.co/GnWXx131uD
a. More up-to-date info can be found here: https://t.co/j801HcJmPP
2. AReal - Large-Scale Async RL: https://t.co/qhOvsQK09N
3. PipelineRL - Fast On-Policy RL: https://t.co/iRM7KzySXe
4. AsyncFlow - Async Streaming RL: https://t.co/YwmzFtiU2q
RL for Agents
1. DeepSWE - Open Coding Agent Trained w/ RL: https://t.co/GHQHcmtE6F
2. AutoForge - Environment Synthesis for Agentic RL: https://t.co/mr3WDIL5vq
3. Agent-R1 - Training Agents w/ End-to-End RL: https://t.co/xpfQJGgzEv
4. AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: https://t.co/7fbVl0RWXG
5. The Landscape of Agentic RL: https://t.co/OMnSV4rgdW
6. Training SWE Agents with RL: https://t.co/YqMqySbyXS
Case Studies & Tech Reports
1. Kimi tech reports:
a. Kimi K2 - Open Agentic Intelligence: https://t.co/aAw17SXrIw
b. Kimi End-to-end Agentic RL: https://t.co/ProBpOPIiI
c. Kimi K1.5 - Scaling RL for LLMs: https://t.co/kRGOxY9Jvp
2. Composer series from Cursor:
a. Composer 2: https://t.co/K0v8rNCE6Z
b. Composer 2.5: https://t.co/D9PYimfOMU
3. Olmo 3 (also has open code / data): https://t.co/khetJFvp6N
4. MiniMax tech reports:
a. MiniMax-M2: https://t.co/HApb0OB80S
b. MiniMax-M1: https://t.co/mZj9UQsrnC
5. Nemotron 3 (NVIDIA): https://t.co/lCpE1GzxSi
New on the Engineering Blog: The access and permissions we grant agents should evolve with their capabilities. In our own products, we set these parameters through sandboxing, which limits the scope of any potentially destructive actions.
Read more: https://t.co/KfBKW8O9kP
刚刷到一部新鲜出炉的 AI 电影。确实太厉害!
《代理人》,片长46分钟,我感觉这大概代表了现阶段 AI 视频的天花板水平。
画面一致性、运镜、风格这些东西,你压根不会去在意——不是因为它做得将就,恰恰相反,是因为这些已经完全不是问题了,你的注意力会被剧情拽着走,根本没有空闲去挑剔技术层面的事。这种体验,说真的,之前看AI视频从来没有过。
剧本也有点《黑镜》的味道,带着那种让你看完会回味、会隐隐不安的劲儿。
刚好周末,时间也不长,推荐大家去看看
Not all AI agents are built the same. So what sets them apart?
Here’s a breakdown of 10 core types of AI agents you’ll come across in real-world systems, from simple reactive agents to complex multi-agent systems.
1. Task-Specific AI Agent
Built for one focused task like summarizing or translating. It follows a fixed process with no learning or adaptation.
2. Reactive Agent
Responds to immediate input without using memory or history. Think of it like a reflex - it reacts, not plans.
3. Model-Based Agent
Builds an internal map of its environment. Simulates outcomes before acting to make smarter, context-aware decisions.
4. Goal-Based Agent
Starts with a goal and works backward. It plans steps, simulates paths, and selects the route that achieves the goal.
5. Utility-Based Agent
Chooses actions based on how beneficial they are. It weighs all options and picks the one with the highest value.
6. Learning Agent
Improves over time by learning from past actions. Adjusts its strategy using feedback and stores new knowledge.
7. Planning Agent
Focuses on long-term strategy. It defines a goal, maps out steps, and adjusts based on progress not just reaction.
8. Reflex Agent with Memory
Uses preset rules but with added memory of past inputs. Helps respond better when situations repeat or evolve.
9. Multi-Agent System Agent
Works with or against other agents. They share environments, negotiate roles, and coordinate to reach a bigger goal.
10. Rational Agent
Always selects the most logical option. It analyzes the full picture, predicts outcomes, and chooses the smartest path.
Save this if you're exploring Agentic AI or designing intelligent decision-making systems.