@R2rule1@chetan_ It possibly learned some hand-dominance from humans. Or, the stochastic flow matching happened to choose this modality.
If the former, and this is something that has a production impact (unlikely), we could balance this out in the data mix and you'd get ambidextrous
A closer look at Operator.
It reaches across the entire work cell, up to 10 feet high and all the way down to the floor. Operator handles bags, mailers, and boxes with the dexterity to adapt to an endless variety of items on the fly. [1/4]
Physical Intelligence (@physical_int) is building a foundation model that can control any robot to do any task — what the team describes as the GPT moment for robotics. The company's cross-embodiment approach trains across many different robot platforms, and recent results show tasks being performed zero-shot that last year required hundreds of hours of data collection.
In this episode of the @LightconePod , co-founder Quan Vuong (@QuanVng) sat down with @garrytan, @snowmaker, @sdianahu, and @harjtaggar to talk about why robotics is finally ready for its scaling moment, how PI runs its models in the cloud rather than on-device, and the playbook for what Quan sees as a Cambrian explosion of vertical robotics companies.
00:00 — Robotics just got cheaper
00:41 — The GPT moment for robotics
02:24 — Why robots didn’t work before
05:30 — The breakthrough that changed everything
09:12 — The data problem
13:33 — Robots learning without data
15:05 — Robots folding laundry (for real)
22:18 — From engineering problem → ops problem
29:12 — The startup playbook
38:46 — Thousands of robotics startups are coming
Introducing Operator, our newest industrial AI robot built to work, not demo.
Operator handles your warehouse's most repetitive tasks: packing, sorting, and kitting. Up to 24 hours a day, with flexibility and consistency that allows businesses to scale quickly.
This is what we've been building. ↓ [1/4]
Presenting Meridian: a line to connect deterministic compute and language model AI.
From Neural Turing Machines and Differentiable Transformers to The Neural Computer, there’s a rich history of trying to combine traditional deterministic computation with the wildly different architecture of Artificial Intelligence.
I’ve spent the last 4 weeks creating a single neural network that has the combined capabilities of a 4B param language model and a deterministic computation engine based on Web Assembly. It allows the AI deterministic integer computations up to 2^32, control flow (while loops and if statements) and a basic filesystem - all implemented as part of the transformer neural network, no external tool calls.
With this architecture adding fewer than 1 million parameters to an existing 4B param language model I can take it from <20% accuracy on arithmetic with 4-digit numbers to 100% accuracy on 4 digit numbers and 99% accuracy on arithmetic up to 2^32 without adversely affecting the language model’s performance on non-mathematical tasks.
The combined model can precisely execute a range of algorithms including checking number for primeness, finding the GCD of two integers and sorting arrays.
@_joe_harris_ We definitely generate huge amounts of data and have very early stage infrastructure, but that doesn’t stop us training on all of the data we want to
I’ve been working on combining a language model and a basic computer (based on web assembly) into a single AI model. One outcome of that is if the model generates programs or compute instructions, I don’t have to do round trips to the CPU, start a process, run the program/calculation and serialize and tokenize it before feeding the answer into the AI to get its response. I can do it in a continuous loop on the GPU. That’s already a pretty interesting performance characteristic for a certain kind of tool use BUT I realised I can do something even more interesting.
When the machine outputs a compute instruction token, “multiply the two numbers at the top of the stack” for example (that’s a single token). I don’t need to wait for the compute to happen and print a response and enter it in to the input before I can generate the response token. I can just start the network generating the next token immediately after the “multiply” token was generated. Since the stack machine and the language model are part of the same neural network, running on a single GPU, I can start the model generating the next token right away and the language model first layers will run in parallel with the multiply computation (or whatever instruction it is). No waiting at all for it to compute. The output of the compute subnetwork goes into the mid layers of the language model allowing it to steer the next token generation even before it’s been emitted from the gpu and converted to human readable output.
Concretely in my setup the neural wasm implementation runs in parallel with the first 10 layers of the language model and it’s working pretty well.
This week we shipped the first two builds of our newest robot to a customer.
This moment has been in the making for close to a year now, and we can't wait to see our new robots working hard in the real world. Stay tuned for the full reveal soon.