A month ago @pravsels and I set out to reproduce @physical_int’s RL Token paper. Today, I am sharing our research notes from the journey we’ve been on.
https://t.co/sdE7TicwEH
"RL Token" looks like a great and surprisingly simple post-training methodology for optimising robot models for dexterous tasks in the real world!
Over the next few weeks, me and @pravsels will be attempting to reproduce the results (& open source the code)
Stay tuned 👀
While the labs and their agents will run at the speed of compute, the physical world will not
not until we solve robotics
and after we do, physical actions are just another tool call
in 10 years we will be talking about vibe construction and vibe cooking
Anthropic is questioning whether AI may turn out to be altogether useless. This is the single most honest thing Anthropic has ever written.
“But achieving recursive improvement alone does not suggest an immediate change in how industrial production occurs, societies organize, or markets function. More intelligence can’t learn what a drug does over decades of use, can’t hold elections sooner than a constitution dictates, and can’t turn a stranger into an old friend in a weekend. For most people, the felt pace of this future will still be set by the bottlenecks, even if the laboratory upstream runs at the speed of compute. That collision, where recursive intelligence building itself ever faster meets the world of humans, relationships, and governance, is another part of this future we can’t predict.”
@chris_j_paxton Real Q is why ABB yumi is not in the conversation, it’s one of first robots in this form factor from a leading OEM and nobody seems to care
Excited to introduce SOLE-R1, a video-language reasoning model for zero-shot reward prediction for robot manipulation tasks!
SOLE-R1 reasoning can serve as the SOLE signal for learning new tasks (completely from scratch) through online RL - i.e., robots start with random actions and learn previously unseen tasks guided only by SOLE-R1 rewards, without any demonstrations, ground-truth rewards, success indicators, or task-specific tuning.
SOLE-R1 significantly outperforms strong baselines (e.g., Robometer, RoboReward, TOPReward, GPT-5, Gemini-3-Pro) in zero-shot online RL when evaluated across 40 tasks - including a real-world tabletop manipulation setting and 4 sim environments (LIBERO, ManiSkill, Meta-World, RoboSuite).
We open source all models, training data, and code.
Website, demos, and paper at: https://t.co/WCSX8gQl3e
🧵 (1/6)
noticing a trend where robotics startups take photos/videos from other people's robots, slap their logo on them, and use it in a launch video and ads
am I the only one thinking this is misleading?
🤖 Another zero-shot reward model is now in LeRobot: ROBOMETER.
A general-purpose, zero-shot video-language reward model from @UofSC, @UT_Dallas, @MIT, @UW, @allen_ai, and @nvidia that predicts frame-level task progress.
Trained on 1M+ trajectories from 21 robot embodiments, generalizes zero-shot to unseen tasks, scenes, and robots. 2.4–4.5x better downstream success rates across online RL, offline RL, data filtering, failure detection, and data retrieval for IL.
Project: https://t.co/rkKUcYamYT
Paper: https://t.co/gIIwNKdnzv
Right now we're just:
- Scoring episodes 1-5 during collection for later filtering
- Rating episodes on task-specific grading scales.
We'd like to:
- Grab a frame per episode from context cam and annotate static objects with SAM to compare positional distributions in a dataset.
- Add subtask annotations.
- See if we can build something like a UMAP plot of episodes for a given dataset - not sure yet, how to best implement this and how well it works.
- Overlay reward model scores
@DominiqueCAPaul@HKydlicek@rerundotio may I ask what exactly you are labelling? Success/failure? Partial rewards? object segmentation/bboxes? Trajectory quality? task descriptions in language?
lots to think about for robotics folks
go talk about your work to people outside the Valley. can you make a case how what you are building will help them?
we need our fellow citizens to *want* robots in their homes and workplaces, to view them as helpers not job-stealers
I've said it for a while, silicon valley is the worst messenger for technological progress and a big reason why the general public across the west has turned against AI
notice how AI pretty much disappeared from consumer advertisements, and how profilic AI ad campaigns like Apple Intelligence flopped?
AI is still front and centre in ad campaigns in China btw, saw lots of AI ads in April, they haven't conditioned the public to hate tech
robotics might be the most multi-disciplinary engineering field today
add installing and repairing the robot, and wrapping the robot with bubble wrap and putting it to the back of a van for customer deployments (+ sometimes driving the van as well)