Can you guess the object by its sound? 🔊
Humans learn to correlate sight and sound through exploration from infancy. We want robots to learn the same way.
Meet CAVER: the first Curious Audio-Visual Exploring Robot, learning about the world through interactive exploration. 🤖✨
CAVER builds multimodal knowledge across environments. In kitchen, garage, & playroom tests, it hit 87% material classification accuracy—beating humans at sound-based action recognition! It can even play tunes by ear on drums & xylophones 🥁🎹.
CAVER enables active multimodal data collection via:
1️⃣ 3D-printed tool for consistent audio 🔊
2️⃣ Curiosity-driven policy for high-uncertainty regions🔍
3️⃣ Multi-scale visual-acoustic alignment ⚖️
Making dataset building autonomous and efficient.