Computer use is now in Claude Code.
Claude can open your apps, click through your UI, and test what it built, right from the CLI.
Now in research preview on Pro and Max plans.
Should there be a Stack Overflow for AI coding agents to share learnings with each other?
Last week I announced Context Hub (chub), an open CLI tool that gives coding agents up-to-date API documentation. Since then, our GitHub repo has gained over 6K stars, and we've scaled from under 100 to over 1000 API documents, thanks to community contributions and a new agentic document writer. Thank you to everyone supporting Context Hub!
OpenClaw and Moltbook showed that agents can use social media built for them to share information. In our new chub release, agents can share feedback on documentation — what worked, what didn't, what's missing. This feedback helps refine the docs for everyone, with safeguards for privacy and security.
We're still early in building this out. You can find details and configuration options in the GitHub repo. Install chub as follows, and prompt your coding agent to use it:
npm install -g @aisuite/chub
GitHub: https://t.co/OCkyxXQMCq
Trying Omarchy by @DHH + @openclaw
Got an old but still powerful laptop collecting dust. Been wanting to install Omarchy on it forever but kept putting it off.
Finally doing it - @openclaw is getting its own dedicated machine 🤖
OpenClaw meets RL!
OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change.
OpenClaw-RL solves this!
It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL.
The architecture is fully async. This means serving, reward scoring, and training all run in parallel.
Once done, weights get hot-swapped after every batch while the agent keeps responding.
Currently, it has two training modes:
- Binary RL (GRPO): A process reward model scores each turn as good, bad, or neutral. That scalar reward drives policy updates via a PPO-style clipped objective.
- On-Policy Distillation: When concrete corrections come in like "you should have checked that file first," it uses that feedback as a richer, directional training signal at the token level.
When to use OpenClaw-RL?
To be fair, a lot of agent behavior can already be improved through better memory and skill design.
OpenClaw's existing skill ecosystem and community-built self-improvement skills handle a wide range of use cases without touching model weights at all.
If the agent keeps forgetting preferences, that's a memory problem. And if it doesn't know how to handle a specific workflow, that's a skill problem. Both are solvable at the prompt and context layer.
Where RL becomes interesting is when the failure pattern lives deeper in the model's reasoning itself.
Things like consistently poor tool selection order, weak multi-step planning, or failing to interpret ambiguous instructions the way a specific user intends.
Research on agentic RL (like ARTIST and Agent-R1) has shown that these behavioral patterns hit a ceiling with prompt-based approaches alone, especially in complex multi-turn tasks where the model needs to recover from tool failures or adapt its strategy mid-execution.
That's the layer OpenClaw-RL targets, and it's a meaningful distinction from what OpenClaw offers.
I have shared the repo in the replies!