Excited to share Qwen-VLA paper, our exploration of generalist Vision-Language-Action models.
It extends Qwen’s multimodal backbone from visual understanding and reasoning to continuous action generation and trajectory prediction.
Paper:
https://t.co/9jvRW0nI8B
Introducing KimoLab: Kimodo + mjlab for prompt to physics-based motion matching for the unitree g1.
I accidentally had claude make this and it's actually pretty cool.
Hey guys!
Yesterday I was talking to my last job friends and remembered that we had a DENSO VS050 model.
They also increased some cool stuff on controlling via ROS2.
So I thought "well.. why not doing it on mujoco?"
Also, I wanted to test @antigravity 😂
https://t.co/KSRgRx4G6g
In a world of PPO everything for reinforcement learning, I've been tinkering with SAC for training a quadruped gait.
This gait is trained purely on CPU (training on one of the Dell GB10s) on a single environment. Training any particular run is obviously slower than PPO on an RTX Pro 6000 with 8092 envs, if you already know the exact hyperparams/rwd function for your PPO algo... but, if we're honest with ourselves, then we know we usually spend days tuning our PPO algo and fighting it to do what we want.
In contrast, SAC has kind of been a breath of fresh air, very amenable to changing the reward function to tune behavior. So far, my first attempts to tune things have consistently just worked immediately rather than 15 different variations of reward hacking only to find previous tuned behaviors got lost in the process. There is also FastSAC, which I've not yet tried, but can speed things up potentially and introduce scale back into the equation.
My main painpoint in getting SAC to work for gait was actually getting it to learn to step. It seems as though SAC is not as good as PPO at significant exploration on its own. I ended up starting with a sinusoidal gait (basically just a rule to make legs swing) as training wheels then blended it out through training as phase 1, then began working on smoothing things out after this.
I think if we look at end to end dev time rather than any particular run that finally managed to work, SAC may actually be the "faster" algorithm to train. Quadruped gaits are inherently easier than bipedal and maybe there are areas where SAC falls short, but I'll definitely be spending more time with SAC.