mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
Welcome to the team, Han Wang!
Han joins as VP of Engineering, bringing expertise in computational mathematics, national security, and cybersecurity, with hands-on experience leading large-scale engineering infrastructure from Lawrence Livermore National Laboratory to the U.S. Army Reserve Cyber Corps. Han also co-founded YC-backed Upfort and most recently served as Head of Engineering, Infrastructure at Numeral.
His expertise comes at a pivotal moment: as we build the infrastructure to train and scale our foundation models like Pearl, having the right engineering leadership in place is critical.
Learn more and explore our open roles: https://t.co/dOqODsSrIA
Our AI partnership with @Incyte has taken a major step forward and is now one of the most ambitious AI-pharma collaborations.
Here's how the partnership is growing:
➡️ $120M upfront consideration ($80M cash + $40M equity investment in Genesis), plus recurring research funding, potentially up to several billion dollars in contingent milestone payments, and royalties
➡️ Incyte's proprietary experimental data will help train the next generation of foundation models in GEMS (Genesis Exploration of Molecular Space)
➡️ At least five new collaboration targets, with options for more
AI for drug discovery just hit a new milestone. By pairing our AI platform with Incyte’s best-in-class drug development engine and proprietary data, we’re building a flywheel to accelerate the discovery of novel medicines, helping us get new drugs to patients who need them.
Full announcement: https://t.co/1IyrkSAhuc
@GavinNewsom CA needs to waive CARB until the Iran War is over so we can import gas refined for other states. We have 6 weeks of gas left and the legislature is wringing their hands!
https://t.co/HLRpkYO7c8 (Email SaaS vendor) told me to delete 80% of our email list to lower the bill.
So I rebuilt the whole thing with Claude in 27 days, part-time. From $5,500/yr to $25/mo.
Full story: https://t.co/VRmvZ1U1J6
OpenClaw meets RL!
OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change.
OpenClaw-RL solves this!
It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL.
The architecture is fully async. This means serving, reward scoring, and training all run in parallel.
Once done, weights get hot-swapped after every batch while the agent keeps responding.
Currently, it has two training modes:
- Binary RL (GRPO): A process reward model scores each turn as good, bad, or neutral. That scalar reward drives policy updates via a PPO-style clipped objective.
- On-Policy Distillation: When concrete corrections come in like "you should have checked that file first," it uses that feedback as a richer, directional training signal at the token level.
When to use OpenClaw-RL?
To be fair, a lot of agent behavior can already be improved through better memory and skill design.
OpenClaw's existing skill ecosystem and community-built self-improvement skills handle a wide range of use cases without touching model weights at all.
If the agent keeps forgetting preferences, that's a memory problem. And if it doesn't know how to handle a specific workflow, that's a skill problem. Both are solvable at the prompt and context layer.
Where RL becomes interesting is when the failure pattern lives deeper in the model's reasoning itself.
Things like consistently poor tool selection order, weak multi-step planning, or failing to interpret ambiguous instructions the way a specific user intends.
Research on agentic RL (like ARTIST and Agent-R1) has shown that these behavioral patterns hit a ceiling with prompt-based approaches alone, especially in complex multi-turn tasks where the model needs to recover from tool failures or adapt its strategy mid-execution.
That's the layer OpenClaw-RL targets, and it's a meaningful distinction from what OpenClaw offers.
I have shared the repo in the replies!
I started a challenge project this January to see if I could build an agent centric browser and capture the top score for Online Mind2Web Benchmark. I completed this goal last week and held the top score of 90.53% for all of 2 days until GPT-5.4 beat it with 92.8%. 🫠
@markfrancisio@ctatedev@markfrancisio try giving agent-brower-protocol a shot. It works out of the box with codex/cc and scores 90% on Online Mind2web.
codex mcp add browser -- npx -y agent-browser-protocol --mcp
Open source [BSD] https://t.co/iyFBN3tZA0
Agent Browser Protocol
ABP reformats web navigation into the discrete, multimodal chat format agents know and love. 90.53% on Online Mind2Web
https://t.co/dXwRdR5NMa
@ctatedev Raw CDP is powerful, but for agents the page still drifts too much between actions.
What if JS / virtual time were paused after every step, so each action happens against a stable page state?
That seems like a missing primitive for agent browser use.
CDP is an anti-pattern for agentic browser use.
Agent Browser Protocol (ABP) redesigns browser control for agents, turning browsing into multimodal chat.
90%+ on Online Mind2Web. Fully open source.
Open source: https://t.co/iyFBN3tZA0
@hhsun1@osunlp@ysu_nlp@luke_ch_song Appreciate it! Had a lot of fun running the benchmark on agent-browser-protocol. One thing I noticed is that frontier models definitely understand webapp UI now, it's more a mechanical delay from their stale internal state that tends to derail agent driven web browsing.