Use both:
https://t.co/BQBBunr9IO
TO is a Newton step on the Bellman equation. Policies and value functions are "memories" of past solutions; TO should be optimizing over them at inference time. Best of both worlds. Some of the strongest RL methods do this.
Looking for new projects this year? ARCTIC (https://t.co/rQTypaxCJB) is a dataset that includes accurate body/hand/object poses, multi-view RGB videos for articulated object manipulation. There have been several emerging directions that worth following up since release. See 🧵
HandNeRF
Official Pytorch Implementation of "HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image", ICRA 2024
https://t.co/U85rJVEV5P
Excited to share our work in ICCV23(Oral)!
tl;dr: 3D-fy everyday hand-object interaction clips, no template required. w/ @Poorvi_rh , @shubhtuls, and Abhinav. Project page: https://t.co/jBNhFHIIK5
This result from "RoDynRF: Robust Dynamic Radiance Fields" is not trivial. The method uses Mask-RCNN, motion from epipolar lines, coarse-to-fine optimization, a motion loss (RAFT), a depth ordering loss (MiDaS), a disparity smoothness loss, and a new NeRF design. Hard work!
Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
abs: https://t.co/7PukJpWw2V
project page: https://t.co/q5NzKS9us5
The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control!
Our #CVPR2023 Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!
https://t.co/cOUf5RPZiu
Introducing ���𝗼𝗯𝗼𝗣𝗶𝗮𝗻𝗶𝘀𝘁 🎹🤖, a new benchmark for high-dimensional robot control! Solving it requires mastering the piano with two anthropomorphic hands.
This has been one year in the making, and I couldn’t be happier to release it today! Some highlights below:
Excited to share my internship project at NVIDIA! w/ @SifeiL@shubhtuls@XueT_Li Abhinav, etal.
We revisit the classic affordance problem with diffusion models. With the recent advance in image synthesis, we can now hallucinate rich human interactions vividly.
"near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence [...] a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation"