@DuinoDu Well the LoRA you see on the viz is a semi failed run. It goes to the cupcake but fails in the grasping most times. I'm working on a better way of visualizing it but it's not easy...
Using the same reprojection system: a live preview of the motion planning output of a VLA in 3D.
pi0.5's action expert plans 50 steps in advance and replans before the end of the trajectory. This is what the planned end effector trajectory looks like.
From one Aruco marker I got the relative positions of my wrist and global cameras and of my robot's kinematics chain.
The Aruco is flat on the table so I can project the wrist camera's intrinsics on the table plane, and get an estimate of the wrist cam from global pixels only.
@John_lussier_ For an individual probably a cleaned up log of all your conversations with agents, but not sure this is worth it for one person. For a company idk man it depends, internal documentation, agent chat logs of the best employees in the company, sky is the limit!
I'm pretty sure a GLM5.2 LoRA'd on an internal dataset is by far the best model in the world right now and would not cost that much compared to opus/fable/5.5 pro API costs over time. If you're in a company pitch the idea!
GLM-5.2 from @Zai_org on ARC-AGI (Verified)
- ARC-AGI-2: 22.8%, $0.25
- ARC-AGI-1: 77.0%, $0.19
Performance is comparable with GPT-5.4 & 5.5 (Low Reasoning Effort)
Did you know?
Pangram learns the difference between Claude, ChatGPT, and Gemini in its internal representations, even without being trained on it!
This signal is increasingly recoverable throughout the network, reaching 91% accuracy on a simple linear probe!
Did you know?
Pangram learns the difference between Claude, ChatGPT, and Gemini in its internal representations, even without being trained on it!
This signal is increasingly recoverable throughout the network, reaching 91% accuracy on a simple linear probe!
@pbshgthm My friend is using SAM3D for hands only with pretty good results! Surprisingly fast if you spend a bit of time doing basic inference optimizing (he got it running at 20Hz on a 3080 iirc)
@VMises76153 Pi0.5 outputs joint angles, not sure if you can finetune it on cartesian positions easily. I get the end effector pos just by computing the position of the end of the FK chain
Using the same reprojection system: a live preview of the motion planning output of a VLA in 3D.
pi0.5's action expert plans 50 steps in advance and replans before the end of the trajectory. This is what the planned end effector trajectory looks like.
STOP HOLDING BACK WHEN PROMPTING you can literally one shot whatever feature in one prompt just yap for longer. aim to describe every thing you can possibly imagine in ONE prompt
and obviously use voice. i often talk for 15minutes straight
Arm: AgileX Nero
Model: pi0.5 LoRA (openpi default setup, 100 demos of pick and place one of 3 colored cupcakes in the plate)
Prompt: "Place the pink cupcake in the plate"
Man, these language models suck at programming. I asked it to reverse engineer this entire mystery wifi blob with two physical devices as test harnesses and after three days and a few billion tokens it only figured out how to work around all of the upper layer wifi stack