I built a system that turns one 2D image into 3D clinical insight for pre-hospital trauma assessment
🔗 Medium Article: https://t.co/edlj1Ld6tC
💻 GitHub: https://t.co/4ZFlWiLShb
#ComputerVision#MedicalAI#3DPose
A bit late but I'm happy to share that our paper, "From Flat Images to Clinical Intelligence: A 3D Trauma-Aware Digital Twin Pipeline for Preclinical Emergency Assessment", has been accepted at KES 2026 (CORE Class B), in Dublin, Ireland 🇮🇪
🎉Our ForeHOI has been accepted to CVPR 2026! We tackle 3D reconstruction from video under severe occlusion in hand-object interactions, with an end-to-end feed-forward pipeline. Dataset(400k synthetic samples): https://t.co/t8yRFPH5GP Code(SAM3D version): https://t.co/6fXT4hKbEt
Arbor: Control 3D Generation with Explicit Geometry
Text-to-3D makes plausible assets, but spatial control is still hard. Arbor adds typed 3D constraint: hull, avoidance, and touch.
Paper w/ @markb_boss@AjlEngelhardt @Simondonne @CG_Tuebingen Intern @StabilityAI
More 👇
Took a deep dive into @UnrealEngine’s new Markerless MoCap plugin just released and am really impressed. It is a game changer for this quality of mocap to be accessible to everyone for FREE.
The 30 sec video took 6 mins to process on a 5090 GPU so patience is key. The raw data you get is impressive though! Captures nice nuance and even doesn’t break when my hand goes out of frame.
I’ll be continuing to do some experiments and share results. Can’t wait to hook it up to @impulsetool to start to build dances at scale!
Congrats to @Michael_J_Black & team for the excellent work and commitment to bring mocap to the masses.
NVIDIA just published ArtiFixer. It tackles one of the most frustrating parts of 3D Gaussian Splatting: what happens in areas you didn't capture.
Standard reconstruction falls apart in under-observed regions. Floaters, blurry holes, missing geometry. ArtiFixer patches those gaps with an autoregressive diffusion model that stays consistent with what exists while generating plausible content where there's nothing.
1-3 dB PSNR improvement across all MipNeRF 360 scenes. On some of them, it's the only approach that produces usable output at all. Open source: https://t.co/PKbG6rZNgM
Incredible to see what a professional animator can do with AI video.
He created a 3D previz and then had Seedance render the actual anime - with the motion and camera control preserved!
(made by @craftcapitallab)
I made a tool called Build Buddy.
It's an AI that lives on top of any game engine.
Paste any YouTube tutorial and it follows along with you, pauses the video when you need to do something, and points exactly where to click.
Been using it to learn UE5 stuff I always kept putting off..
It's free, check it out!
Google just released the dense prediction TIPSv2 models on Hugging Face
A vision encoder with DPT heads for depth estimation, surface normals, and semantic segmentation — all trained on TIPSv2 B/14.
I've migrated the old Mast3r-SLAM example I had made last year to the latest version of @rerundotio and made a bunch of improvements! I wanted to spend some time with agents to modernize it. Here's an example of me walking around with my iPhone and getting a dense reconstruction at about 10FPS on a 5090.
Heres the following improvements I made.
Brought it into the monorepo with proper packaging:
• Using @prefix_dev pixi-build to get rid of all the mast3r/asmk/lietorch vendored code with just a few small patches. This let me remove so 60k lines of code from the repo!
• Don't have to build the lietorch code on my machine anymore, which was taking ~10 minutes to compile (and also made it work on blackwell when it previously did not)
Rebuilt the @Gradio interface:
• Fixed incremental updates, .MOV uploads, and stop behavior
• Made the CLI + Gradio interface share the same entry point so updates automatically propagate
Upgraded the @rerundotio integration:
• Switched to a multiprocessing async logging strategy
• Added video/pointmap/confidence logging
• Improved blueprint layout and hid noisy entities from 3D view
• Biggest perf win was the async background logger - documented about a ~2.5x speedup from decoupling logging from tracking
The newest and most interesting part was my attempt to replace the CUDA kernels for Gauss-Newton ray matching with a @Modular Mojo backend. As a Python dev, every time I look at CUDA code I basically shy away as it's pretty difficult for me to understand. Mojo let me rewrite the matching logic in a syntax I'm more comfortable with while still getting near-CUDA performance. Mojo is now the default matching backend with CUDA fallback. One major piece that's missing is the custom PyTorch op path, but I'll eventually do that as well.
I heavily leaned on Claude Code to do the CUDA → Mojo migration, and I have no doubt it's not the cleanest or most idiomatic, BUT it's way more readable for me and helps me better understand the underlying algorithm.
This was a ton of work, and a large part of why I'm doing it is how the monorepo compounds. This becomes an artifact for the next example I want to build with Claude that I can point to, which will make it even faster to implement. The compounding nature of this is really interesting and part of why I'm spending so much time trying to make things nice and readable.
Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: https://t.co/5IZ0tPlqvr
Here we show Boxer in action on an egocentric sequence captured from smart glasses:
Excited to share our work, Know3D, which connects LLMs' reasoning ability and knowledge to 3D generative models. This increases the controllability and plausibility of unseen parts in the generated 3D shapes.
Paper: https://t.co/qPi4Pmkzgg
Project page: https://t.co/nHnP5O5NWy
3D-RE-GEN
3D Reconstruction of Indoor Scenes with a Generative Framework
https://t.co/xJQRiWLA3G
We propose single-image 3D scene reconstruction for producing complete, editable scenes from a single photograph. Our method reconstructs individual objects and the surrounding background as textured 3D assets, enabling coherent scene assembly from minimal input. We combine instance segmentation, context-aware generative inpainting, 2D-to-3D asset creation, and constrained optimization to recover physically plausible geometry, materials, and lighting. The resulting scenes preserve correct spatial relationships, lighting consistency, and material fidelity, making them suitable for production-ready workflows.
back when I taught 1st year mechanics, I pulled up Steam and launched Gunpoint to talk about projectile motion. And the developer's issue on having to plot out the trajectory in a single frame, which necessitates the t-independent equation
NVIDIA's Kimodo is the release of the week 🔥
Prompt the timeline whatever your want like: "a person walks forward" → "a person starts jumping", hit Generate, and watch a 3D character do it in seconds
(700hrs of pro mocap training. Works on human + robot skeletons. Super fast + free to use on HF)
𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁.
That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks.
By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds.
A classic algorithm — redesigned for modern GPUs.
Paper: https://t.co/z0z0d3vrlp
Code: https://t.co/BqRfVKGH0K