Stumbled on Reversal Q-Learning today and got a little obsessed — it's one of those ideas that's so elegant it feels obvious after you see it.
Spent the whole afternoon building an interactive explainer to do it justice. 12 little demos you can drag and play with 👇
https://t.co/6CMxL9gozS
@E0M@chris_j_paxton Impressive! Could the passive elasticity of this gripper eliminate the need for force feedback control signals, since force can in some cases be measured visually?
🚀 VAMPO is accepted to ICML 2026!
Robots should not just predict plausible videos — they need to predict the right visual dynamics for action.
We turn video diffusion denoising into policy optimization, improving future prediction for robot control.
Paper: https://t.co/VPcPimdwdN
Website: https://t.co/IMwfI9M1m7
Crazy thought: Has anyone tried using Classical Chinese (文言文) to maximize LLM token efficiency? 📜🤔
It’s essentially an ancient, extreme form of data compression. In the agentic era where context windows and token costs dictate performance, could this ultra-dense language be the ultimate practical hack for agent-to-agent communication?
Would love to read any research or experiments on this! 👇
#LLMs #AIAgents #NLP #TokenEfficiency #MachineLearning #AI
As a $20 claude user, I feel like I have hired an amazing engineer who takes long breaks frequently, and always has a 4 day weekend. 🤷♂️
source: jepace (reddit)
If you missed @chichengcc's guest lecture on "Robotics: Beyond Algorithms" from my @ETH robot learning course, check it out on YouTube! He shares insights that are rarely taught & hard to learn in academia.
📽️ YouTube: https://t.co/coRqOHHB3n
📚 Course: https://t.co/QJcfXJRfX8
Robots do not care about success or failure. They care about learning the world. Success is just human preference imposed on top of dynamics.
Human preference is not the foundation of robot learning—it is a bottleneck. The robot should first learn the world, then learn what we want.
Why do some continual learning methods forget everything while others don't?
We now have a single number that explains it: Context Channel Capacity (C_ctx).
📐 Zero forgetting ⟺ C_ctx ≥ H(T)
🔺 Impossibility Triangle: zero forgetting + online learning + finite params → pick 2
🧠 HyperNets bypass the triangle entirely by redefining params as functions, not states
Validated across 1,130+ experiments on 8 CL methods. C_ctx perfectly predicts who forgets and who doesn't.
📄 https://t.co/MyIXCQJORM
@bercankilic learn how to forget is very hard, I've tried 1800+ experiments with my agent teams, failed and failed again, here's my failure summaries: https://t.co/nR72EYNUWo if you want to waste your time reading it.