More folks sld follow the works from @Ed__Johns lab at Imperial. The data efficiency they r going for is incredible.
Their Instant Policy and Learning 1000 Tasks in a Day papers are definitely worth reading.
Photo: taken at Beyond Teleop workshop #icra2026
i was visiting a hackathon where 80+ participants were training pi0/0.5, gr00t, smolvla, ACT, DP, etc. on lerobot arms
the best and most sample efficient policies were trained *from scratch*
we still do not have an open source x-embodied GPT-2, but i'm hopeful for this year
Here’s the windmill assembly demo we showed at CES 2026 — the one no one saw coming.
North executes a fully autonomous, long-horizon dexterous sequence with sustained hand–eye–tactile coordination and assembly-level precision enabled by tactile feedback.
It’s also robust to disturbance: you can reposition the objects, and North will still identify them and recover the task.
This is powered by CraftNet (VTLA) — using tactile feedback to continuously fine-tune the last-millimeter interaction, enabling reliable execution across 30+ steps.
Read more about CraftNet: https://t.co/RS0hCu1bh9
#Sharpa #SharpaWave #SharpaNorth #CraftNet #System0 #CES2026
I'm on a singular mission to solve the Physical Turing Test for robotics. It's the next, or perhaps THE last grand challenge of AI. Super-intelligence in text strings will win a Nobel prize before we have chimpanzee-intelligence in agility & dexterity. Moravec's paradox is a curse to be broken, a wall to be torn down. Nothing can stand between humanity and exponential physical productivity on this planet, and perhaps some day on planets beyond.
We started a small lab at NVIDIA and grew to 30 strong very recently. The team punches way above its weight. Our research footprint spans foundation models, world models, embodied reasoning, simulation, whole-body control, and many flavors of RL - basically the full stack of robot learning.
This year, we launched:
- GR00T VLA (vision-language-action) foundation models: open-sourced N1 in Mar, N1.5 in June, and N1.6 this month;
- GR00T Dreams: video world model for scaling synthetic data;
- SONIC: humanoid whole-body control foundation model;
- RL post-training for VLAs and RL recipes for sim2real.
These wouldn't have been possible without the numerous collaborating teams at NVIDIA, strong leadership support, and coauthors from university labs. Thank you all for believing in the mission.
Thread on the gallery of milestones:
Robot can now learn 1000 manipulation tasks in a day, and we're still in 2025.
"Researchers at the Robot Learning Lab at Imperial College London recently developed a new imitation learning approach that could allow robots to successfully learn new tasks faster and without requiring substantial training data."
"Using this method, which was introduced in a paper published in Science Robotics, they were able to train a robotic arm to complete 1,000 different tasks in a single day."
So excited to finally talk about this work!
Veo is a surprisingly strong world simulator. We fine-tuned Veo on action-conditioned, multi-view robotics data.
Key result: running a policy in the world model is strongly correlated with real-world results.
A few important take-aways:
1) Veo Robotics models real-world physics and robot interactions
2) The base model's world knowledge is retained after fine-tuning and can model OOD scenarios not seen in the robotics data
3) The world model can be used to score task success or failure for a given policy
4) This proves useful for predictive red teaming: simulate dangerous or rare scenarios that would be difficult or irresponsible to execute on the real robot, and judge its performance
I couldn't be more excited about where generalist video models are headed.
Scalable evaluation is one of the most challenging problems in robotics today. Expensive + slow real-world deployments have been the only gold standard.
But now, generative video models can provide *predictive and scalable* evaluation signal for real-world robots!🌎
Okay let's talk about this. What happens:
- operator forgets to "clutch out" and tell the robot to stop tracking his hands
- the robot hands go up to its head
- the robot doesnt move its legs to stay balanced - which seems to imply its not doing whole body control like we would expect
- he turns off the robot while its in this unbalanced state, and its arms go to a neutral position close to where the video started-- but because the arms are out of position it makes a huge, fast movement and absolutely smashes that water bottle, sending water spraying everywhere
A lot of people are blaming the operator here but i think forgetting to press a button is, again, pretty human, and you could build a much more robust system here. No reason imo this robot should have fallen.
That’s exactly right, the infrastructure for handling data, training models and setting up robots is immense and the more ambitious the project the worse it gets.
Amazing initiative from @Neuracore_AI and @stepjamUK!
𝗧𝗵𝗶𝘀 𝗶𝘀 𝗲𝘅𝗮𝗰𝘁𝗹𝘆 𝘄𝗵𝘆 𝘄𝗲'𝗿𝗲 𝗴𝗶𝘃𝗶𝗻𝗴 𝗳𝗿𝗲𝗲 𝗮𝗰𝗰𝗲𝘀𝘀 𝘁𝗼 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗮.
Imperial’s Robot Learning Lab published some remarkable work last month: teaching a robot 1,000 manipulation tasks in 24 hours from a single demonstration each.
What the paper doesn’t spotlight is the hidden cost behind that result: 5,650+ real-world rollouts and the massive infrastructure required to support them - data collection, sensor sync, format wrangling, storage, and training.
At Neuracore, we see brilliant researchers spending months building pipelines instead of advancing what robots can learn. We see PhD students writing their 15th format converter instead of running their next experiment.
𝗧𝗵𝗮𝘁’𝘀 𝘄𝗵𝘆 𝘄𝗲’𝗿𝗲 𝗼𝗳𝗳𝗲𝗿𝗶𝗻𝗴 𝗳𝗿𝗲𝗲 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝗮𝗰𝗰𝗲𝘀𝘀 𝘁𝗼 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗰 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗹𝗮𝗯𝘀.
Not as a giveaway, but rather 𝗮𝘀 𝗮𝗻 𝗶𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁 𝗶𝗻 𝘁𝗵𝗲 𝗲𝗰𝗼𝘀𝘆𝘀𝘁𝗲𝗺:
• Researchers focus on algorithms, not infrastructure
• Breakthroughs happen faster when data → policy takes days, not months
• Students graduate with battle-tested workflows
• Strong academic validation accelerates industry adoption
Work like this shows the future of robot learning: extreme data efficiency. Our job is to provide the infrastructure that makes discovering the next MT3 frictionless.
If your lab is working on imitation learning, manipulation, or embodied AI, let’s talk.
Credit: @imperialcollege@Kamil__Dre@pitvit_@vitalisvos19@Ed__Johns
Paper: https://t.co/oSGhVtaxwl
Even large VLAs can play ping-pong in real time! 🏓⚡️
In practice, VLAs struggle with fast, dynamic tasks:
• slow reactions, jittery actions.
• demos often shown at 5-10× speed to look “smooth”.
We introduce VLASH:
• future-state-aware asynchronous inference with >30Hz inference frequency for PI0.5
• drop-in to existing VLAs with no extra overhead
• enables PI0.5 / PI0 to play ping-pong and other highly dynamic tasks in real time
📄 Paper: https://t.co/01bKQmMCKs
🔧 Code: https://t.co/NfQ80ASZOK
I've been working on deformable object manipulation since my PhD. It was totally a nightmare years ago and my PhD advisor was telling me not to work on it for my own good.
Today, at ByteDance Seed, we are dropping GR-RL, a new VLA+RL system that manages long-horizon precise dexterous manipulation of deformable objects.
This is probably the first real-world RL system to make a robot:
✅ Lace up your shoes end to end
✅ Hit millimeter tolerance repeatedly
✅ Recover from mistakes (See video!)
✅ And complete continuous shoelace threading on a real bimanual platform
📈 Success rate: ↑ from 45.7% → 83.3%
Yes, robots can now actually do this.
Project page: https://t.co/JfOGajAXiY
ArXiv: https://t.co/xFfAE9isdo
“The thing that happened with AGI and pretraining is that in some sense they overshot the target.
You will realize that a human being is not an AGI. Because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning.
If I produce a super intelligent 15 -year -old, they don't know very much at all. A great student, very eager. [You can say,] ‘You go and be a programmer. You go and be a doctor. Go and learn.’
So you could imagine that the deployment itself will involve some kind of a learning trial and error period. It's a process as opposed to, you drop the finished thing.”
@ilyasut
The usual deep tech mistakes:
• Talk to 5 customers, not 50
• Build the “perfect” product before selling
• Stay in one country and accept 18-month cycles
• Treat deployment as an afterthought
The @in_bolt founders built by rejecting all of this.
3 years ago we could showcase AI's frontier w. a unicorn drawing. Today we do so w. AI outputs touching the scientific frontier: https://t.co/ALJvCFsaie
Use the doc to judge for yourself the status of AI-aided science acceleration, and hopefully be inspired by a couple examples!
Selling 100 robots is hard. Maintaining them across 10 sites is harder.
Who fixes it at 2 AM? Where are spare parts? Remote updates? Different firmware versions? Service infrastructure costs more than R&D.
I believe that in terms of navigation the policy predicts something similar to a velocity or a waypoint.
If you change hardware (hw) you should be able to use the same policy output as long as you have a controller for that hw. A controller is by no means easy to design, but much less of a problem than having to retrain the policy with new data.
In terms of manipulation, A humanoid would mean different torso kinematics, but you should be able to simply adjust the kinematics chain/controller and reuse the policy outputs here too.
Only problem would be if the body of the robot is often in the camera view. In that case the policy would be seeing something different with a different hw. This being said they were able to map human to robot already (maybe with some kind of masking or impainting) so that shouldn't be a problem.
These are only my opinions though. I could be wrong
I'm incredibly impressed. I'd say this is the best home robotics demo we've seen ... by a mile.
I was a bit sceptical of the gripper design, but actually, after seeing it together with the human hand, it makes sense. It actually seems to be providing the necessary dexterity without the over complexity of the 3 million degrees of freedom humanoid hands that some people are going for.
You could argue that objects could have been placed always basically in the same spot. But that would be unfounded, with that much navigation the robot is bound to be in a different relative pose to the objects regardless.
Great demo! Keep it up
Today, we present a step-change in robotic AI @sundayrobotics.
Introducing ACT-1: A frontier robot foundation model trained on zero robot data.
- Ultra long-horizon tasks
- Zero-shot generalization
- Advanced dexterity
🧵->
Learning a thousand tasks in a day
Imagine teaching a robot 1,000 different tricks in a single day—each from just one human demo. That’s the promise behind new work by Kamil Dreczkowski and coauthors: moving robot learning a bit closer to how humans pick up new skills quickly, instead of needing thousands of repetitions per task.
The authors take a very pragmatic route. Instead of one giant end-to-end model, they break manipulation into two simple phases. First, the robot figures out how to line up its gripper with the right object and pose (using language and geometry to find the closest past demo, then classic pose estimation and motion planning).
Second, once it’s in the right place, it replays the fine-grained motion of the original demonstration in the gripper’s own coordinate system. With that recipe, they get a real robot to perform around 1,000 distinct tasks on more than 400 objects, with just one demonstration per task, collected in under a day.
What I like here is the message: you don’t always need a bigger model, you need the right structure. By separating “where to go” from “how to move once you’re there,” MT3 becomes data-efficient, interpretable, and easier to debug—and in the low-data regime, it outperforms standard imitation learning baselines.
For those of us dreaming about robots that set up experiments, handle samples, or reconfigure lab equipment almost on demand, this kind of approach—smart inductive bias plus minimal supervision—looks like a very interesting step forward.
Paper: https://t.co/VkjPTMRHob
This is a fair doubt. Personally I think this is a good idea. I agree that one of the few places where I see the humanoid form factor being a good choice is the home.
This being said, I believe that in an industry such as robotics, the most pressing matter is the usefulness of the robot. Aside from staircases, this form factor is very effective and has much easier control compared to a legged humanoid.
Not having to worry about legs means more time spent on solving actual tasks, potentially shortening the time-to-market.
If I am right, this means realizing revenues faster and deploying robots earlier than others, ultimately getting even more data.
Once you have proven your worth and potentially secured a few customers you can design a humanoid alternative if you really deem it necessary.
Bear in mind that the policy would probably transfer no matter what the lower half of the robot is, the main difference would be changing the controller and kinematics. The biggest problem would surely be designing/manufacturing the humanoid itself though.
After taking a look at the blog post I should add something:
They condition their policy on a map of the environment the robot is being deployed in.
I still don’t think they are using dense tactile sensors but they mention the importance of force when manipulating some objects so they could be using force sensors.
I think it mainly is, yes. The hand seems a bit too stiff to have any tactile sensor and honestly I have the impression the team is being very mindful of all complexity that they add to the pipeline. Having tactile at this point would prove little benefit while making the learning problem potentially harder.
I also don't think they are using depth. 4 out of 5 cameras are on the wrist, where depth is notoriously bad. The head-cam could be, but I seriously doubt it.
I do however think they have some proprioception. Probably just where the hands are relative to each other or to the head cam.
I also think there is language as a potential input too. Seeing such a long horizon task makes me think there is an internal VLM that comes up with a plan and passes each stage of that plan to a lower level policy to complete. Similar to @Figure_robot and @physical_int