Can AI image detectors keep up with new fakes?
Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild!
Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes.
#CVPR2025 🧵 (1/5)
🧐A question I've long been interested in: how can we learn from human hands and transfer that directly to robots?
Our new work, HUG, makes it possible in three simple steps: (1) collect human grasps at scale, (2) learn from them, and (3) retarget for deployment.
Hello! If you are interested in dynamic 3D or 4D, don't miss the oral session 3A at 9 am on Saturday:
@zhengqi_li
will be presenting "MegaSaM"
I'll be presenting "Stereo4D"
and
@QianqianWang5
will be presenting "CUT3R"
Excited to share our CVPR 2025 paper on cross-modal space-time correspondence!
We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision.
Our approach learns correspondences through contrastive random walks across visual modalities.
#CVPR2025 (1/6)
Ever wondered how a scene sounds👂 when you interact👋 with it?
Introducing our #CVPR2025 work "Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes" -- we make 3D scene reconstructions audibly interactive!
https://t.co/tIcFGJtB7R
The data is available on Hugging Face, as well as the pipeline code!
Come chat with us at #CVPR2025! We’ll be presenting Friday afternoon at poster #274. (work w/ @andrewhowens)
📄 Project Page: https://t.co/vWLKi8DCYC
💾 Dataset/Code: https://t.co/EssZshboqe
🧵 (5/5)
Can AI image detectors keep up with new fakes?
Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild!
Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes.
#CVPR2025 🧵 (1/5)
Each image is labeled with detailed metadata, enabling more than just fake detection. We are excited to see what the community can build with this data! 🧵 (4/5)
Hello! If you like pretty images and videos and want a rec for CVPR oral session, you should def go to Image/Video Gen, Friday at 9am:
I'll be presenting "Motion Prompting" @RyanBurgert will be presenting "Go with the Flow" and @ChangPasca1650 will be presenting "LookingGlass"
Ever wish YouTube had 3D labels?
🚀Introducing🎥DynPose-100K🎥, an Internet-scale collection of diverse videos annotated with camera pose!
Applications include camera-controlled video generation🤩and learned dynamic pose estimation😯
Download: https://t.co/iL3iqqzYL8
We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).
📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024
We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)
This year I'm organizing ML4H Outreach program, and want to highlight our Author Mentorship program. Whether you're a mentee looking for guidance or a more experienced researcher with time to mentor, we'd love to have you be a part of this program! Deadline to apply is July 5!
These spectrograms look like images, but can also be played as a sound! We call these images that sound.
How do we make them?
Look and listen below to find out, and to see more examples!
NeRF captures visual scenes in 3D👀. Can we capture their touch signals🖐️, too?
In our #CVPR2024 paper Tactile-Augmented Radiance Fields (TaRF), we estimate both visual and tactile signals for a given 3D position within a scene.
Website: https://t.co/PChKoXIF9c
arXiv: https://t.co/nsl93v5tIx
Huge thanks to my collaborators Fengyu Yang, Yi Liu and advisors @andrewhowens@antoniloq !!!
What do you see in these images?
These are called hybrid images, originally proposed by Aude Oliva et al. They change appearance depending on size or viewing distance, and are just one kind of perceptual illusion that our method, Factorized Diffusion, can make.
Can you make a jigsaw puzzle with two different solutions? Or an image that changes appearance when flipped?
We can do that, and a lot more, by using diffusion models to generate optical illusions!
Continue reading for more illusions and method details 🧵