@DominiqueCAPaul@DominiqueCAPaul, i love following your adventures. Thanks for sharing them openly. We are building https://t.co/t9o6XLnezr to drive robot fleets. Would love to meet with Zurich based founders.
Meet MapAnything β a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results π
One universal model enables SoTA for:
π₯ Mono Depth Estimation
π₯ Multi-View SfM
π₯ Multi-View Stereo
π₯ Depth Completion
π₯ Registration
β¦ and many more possibilities! β plus everything is metric π―
We release code for data processing, training, benchmarking & ablations β everything Apache 2.0!
Details & Links π
You know you are an old CV fart when you see SFM and think "Structure from Motion", but they mean "Spatial Foundation Models"
Cool stuff: https://t.co/ZWiwYqF08S
Gemini + Ο0 = actually useful robots! (Similar to what @physical_int did with "Hi Robot")
I can now verbally tell the robot that I'm building a red Lego wall or wooden tower, and it will infer the next steps by itself and pass me the necessary pieces, tools, or materials, ha!
You can also just ask it to bring you things!
The pipeline works as follows:
- OpenAI Whisper (local) β speech to text
- Gemini β makes sense of user requests, converts to robot tasks, bounding boxes, grasping points, etc. (System 2 thinking FTW!)
- Ο0 β robotic actions
The Ο0 was finetuned just for pick-and-place Lego bricks only, and it generalizes beautifully to all kinds of tasks. However, there's lots of room for improvement when it comes to grasping & accuracy.
Things that could help:
- Conditioning on grasping points
- Better data collection (I'm not that great at teleop)
- Lots more synthetic data and simulations
Thought about generating realistic 3D urban neighbourhoods from maps, dawn to dusk, rain or shine? Putting heavy snow on the streets of Barcelona? Or making Paris look like NYC? We built a Streetscapes system that does all these. See https://t.co/pOafp2fWlv. (Showreel w/ π β)
@sellan_s @YouJiacheng Cool thanks! I couldnβt avoid seeing a sphere that was inside the curve in one drawing and outside in another and had to ask π
The wait is over π’ MAST3R is out! DUSt3R+ dense local feature maps & metric depth - 1st in #MapFreeReloc leaderboard, can handle 1000s of images π !!
Blog: https://t.co/gX7ez0fk93
Code: https://t.co/qPcOYTD3Hk
Paper: https://t.co/iq750lQIJK
The Quest v64 update brought two undocumented major new features: furniture recognition on Quest 3 and simultaneous hands & controllers in the home space:
https://t.co/Kw4ZAhqwLB
Now available on Mapillary: Neural Radiance Fields (NeRFs)! π
NeRFs allow for the transformation of a collection of 2D images into detailed, immersive 3D reconstructions.
Read our blog post to learn more and see how you can get started with NeRFs: https://t.co/UwcvHpuFYz