you don’t need a robot to see physical ai in action, you just need a browser.
this week I finally tried something I've wanted to do for a while: design an SO-101 scene in three.js, connect it to a real leader arm, collect teleoperation data, train an ACT model, then port it to run in the browser.
the result is the first ever demo of ACT that runs locally on the web.
try it yourself now at https://t.co/2smZGyVx66
LIBERO-Safety introduces the first large-scale benchmark for evaluating safety in Vision-Language-Action (VLA) models. The benchmark spans 500+ tasks covering both physical safety risks (collisions, spills, unsafe motions) and semantic safety failures (harmful, ambiguous, or unsafe instructions), exposing major gaps in current robot foundation models.
Results show that state-of-the-art VLAs achieve strong task success but struggle with safety. Models frequently execute unsafe actions, misinterpret risky instructions, and fail to reject dangerous requests, highlighting that capability does not automatically translate to safe deployment.
The benchmark provides standardized safety evaluation protocols, metrics, and task suites for future VLA research. LIBERO-Safety establishes safety as a first-class objective for robot foundation models and offers a reproducible framework for measuring progress toward trustworthy real-world robotic systems.
I've been reading a ton of agentic RL papers recently. Out of all the work, one of the only commonly-used tricks is action masking, but this approach is evolving with RL + world modeling papers like ECHO / PaW.
🧵 [1/N]
How do you get perceptive locomotion over rough terrain without brittle terrain classifiers?
Excited to share CTS-MoE, a framework for implicit terrain adaptation via Mixture-of-Experts for perceptive locomotion. No selectors or per-task policies; the policy adapts end-to-end straight from vision.
TL;DR:
→ Perception-driven routing handles diverse, discontinuous terrain implicitly; no high-level task selector or per-task policy distillation.
→ Big gains on hard tasks (climbing, gaps) under MTRL, with smooth transitions on both seen and unseen terrain.
🧵Thread:
How do you get perceptive locomotion over rough terrain without brittle terrain classifiers?
Excited to share CTS-MoE, a framework for implicit terrain adaptation via Mixture-of-Experts for perceptive locomotion. No selectors or per-task policies; the policy adapts end-to-end straight from vision.
TL;DR:
→ Perception-driven routing handles diverse, discontinuous terrain implicitly; no high-level task selector or per-task policy distillation.
→ Big gains on hard tasks (climbing, gaps) under MTRL, with smooth transitions on both seen and unseen terrain.
🧵Thread:
since a good bunch of discourse is going on around "how to do research", these pieces are quite worth a read.
https://t.co/pA0MkOMlKS
https://t.co/rw9uMiwlCj
https://t.co/H1AGvnb7LP
https://t.co/FTyAabr9Rx
1/8
Can Visual Language Models actually plan tasks for robots in a real greenhouse? 🌱🤖
We built a modular framework where a VLM guides a horticultural robot through crop monitoring — interleaving visual queries with action primitives.
The results show both real promise and a critical degradation mode. 🧵
7/8
The takeaway: VLMs aren't yet reliable enough to run a farm on their own.
But the framework is deployable today, and the benchmark gives the community a concrete way to measure progress on long-horizon visual reasoning in the real world.