Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks!
By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV.
Project page: https://t.co/GQgRi6mWwC
(1/5)
Glad to see followups to https://t.co/Up4jNYdD6K, but disappointed that neither the blog (with 34 refs) nor the code repo acknowledged NeuralOS, even tho the released data code appears to build directly on top of ours. That omission is hard to understand given our shared vision.
@peach2k2 So, I think I solved it. You just find a 1-boxer and bet him 10k that the Predictor is gonna get it wrong. Then, it becomes rationally optimal to just take B; you do that and go home with 990k.
New paradigm from Kaiming He's team: Drifting Models!
With this approach, you can generate a perfect image in a single step.
The team trains a "drifting field" that smoothly moves samples toward equilibrium with the real data distribution.
The result? A one-step generator that sets a new SOTA on ImageNet 256x256, beating complex multi-step models.
At SIGGRAPH Asia in December, I presented our latest work on generating arbitrarily large, tileable textures with irregular features, developed together with @timweyrich.
🎨 Today I am excited to announce we added a Blender plugin to our code release: https://t.co/WqsJsg50zC
TL;DR: I made a Transformer that conditions its generation on latent variables.
To do so an encoder Transformer only needs a source of randomness during generation, but then it needs an encoder for training, as a [conditional] VAE.
1/5
If you want to know how we also improve the detection accuracy (QFCA+) check out our paper😁https://t.co/JTd3kmFY5s
-- Work done together with Patrick Rückbeil and Tim Weyrich --
Our VCE group organized VMV 2025 last week. It was a great conference, with impressive research by some really cool people; the quality of the presentations genuinely exceeded my expectations!
We also presented our work that makes zero-shot anomaly detection blazing fast 🧵
Surprisingly, commonly used ML libraries have a suboptimal implementation of local average pooling (including Pytorch, Tensorflow, and Jax). We reimplement it using summed area tables to obtain constant complexity w.r.t patch size and significantly optimize our overall runtime.
We reimplement the FCA algorithm by finding the 1D optimal transport between the histograms and tracking the contribution of each bin to the overall error. The algorithm is now linear w.r.t the number of distinct values after quantization.
Methods like NeRF and Gaussian Splats model the world as radioactive fog, rendered using alpha blending. This produces great results.. but are volumes the only way to get there?🤔 Our new SIGGRAPH'25 paper directly reconstructs surfaces without heuristics or regularizers.
The time for new architectures is over? Not quite! SeNaTra, a native segmentation backbone, is waiting, let's see how it works 🧵https://t.co/2I9nuLBsSz
Had a great experience presenting our work on 3D scene reconstruction from a single image with @VisionBernie at #3DV2025 🇸🇬
https://t.co/zrqu9pnVEM
Reach out if you're interested in discussing our research or exploring international postdoc opportunities @CogCoVi@UniFAU
@Hesamation Here is a simple prompt:
"""
As a math specialist, write a Manim program that explains the following problem with precision and engaging, easy-to-understand animations.
<problem>{your problem}</problem>
<solution>{you can also induce solution}</solution>
"""
Meshtron High-Fidelity, Artist-Like 3D Mesh Generation at Scale from @nvidia
TL;DR: Autoregressive mesh generator based on the Hourglass architecture and using sliding window attention; point cloud to mesh; txt2mesh; mesh2mesh
GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction
Contributions:
• We introduce an efficient framework for high-quality surface reconstruction using 3D Gaussians.
• We integrate the traditional MVS algorithm patch matching and normal priors within our framework to enhance reconstruction fidelity and improve computational efficiency.
• We demonstrate that our method, GausSurf, has superior speed and quality compared to the state-of-the-art GS-based surface reconstruction methods.
Following over 1.5 years of hard work (w/@njroussel& Rami Tabbara), we just released a brand-new version of Dr.Jit (v1.0), my lab's differentiable rendering compiler along with an updated Mitsuba (v3.6). The list of changes is insanely long—here is what we're most excited about🧵
Happy to share our paper: "Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering"
TL;DR: Monocular depth estimate➡️ SfM absolute depth scale➡️ coarse and local mesh optimization for accurate depths
📖: https://t.co/9xjUgz4HBq
📜: https://t.co/KTAe7mq4eu