Paper accepted to #ICLR2026!
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Paper: https://t.co/zMPhLRXjSZ
Project: https://t.co/fzrzrajLKW
Code: https://t.co/RMXPYDU5TS
With @BousselhamWalid, @HildeKuehne, @chrirupp
Last week I attended #ICIAP2025 in Rome, where I delivered a keynote talk. The conference was great, with exciting talks, delicious food, and lively social events. Plus, the auditorium had a grand painting in the background! I thank the organizers for inviting me! Grazie Roma
My favourite project from my time at Synthesia is now ready!
Avatars now move their hands matching the meaning and intonation of the words they are saying. This body language makes the videos even more engaging!
https://t.co/K3Oc2VVoK0
I am happy to announce that I have joined Meta Reality Labs as a Principal Research Scientist, working on Spatial AI to power AR/MR experiences on Meta's wearable devices. It's the start of another adventure, and I thank all my new colleagues for making me feel welcome!
Paper accepted to the “Multimodal Algorithmic Reasoning” NeurIPS workshop!
HAMMR: Hierarchical multimodal agents for handing many diverse VQA tasks in a single system
https://t.co/qIk5J597R1
@LluisCastrejon @tejmensink @howardzzh@andrefaraujo@JRRU
Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E
"Grounding Everything: Emerging Localization Properties in Vision-Language Transformers"
Paper: https://t.co/DCUEi5STV3
Demo:https://t.co/ZdcFtoiCKn
Code: https://t.co/QmWp4UPzUt
Our EXPRESS-1 AI model enables @Synthesiaio avatars to understand and adjust to the script automatically 💥
This is a big milestone, so tune in tomorrow for a pre-launch chat with @MattNiessner, @jnstrck, @vriparbelli and @AlexVoica
X Spaces event link: https://t.co/z2sNytp54q
AI Avatars have learned to interpret text now. 😬
Our soon-to-be-public EXPRESS-1 AI model enables Synthesia avatars to understand and adjust to the script automatically. 🤯
Join the pre-launch tech chat with: @vriparbelli, @MattNiessner & @jnstrck 👀
https://t.co/W0lSSyV64G
Introducing HAMMR: hierarchical multimodal agents that handle a broad range of VQA tasks within a single system (counting, spatial reasoning, OCR, visual pointing, external knowledge, and more).
https://t.co/qIk5J597R1
@LluisCastrejon @tejmensink @howardzzh@andrefaraujo@JRRU
We are running the Vision and Sports Summer school again this year! Prague, July 22-27.
We offer a broad-range of lectures on state-of-the-art Computer Vision techniques, as well as exciting sport activities, such as Volleyball, Frisbee and Table Tennis.
https://t.co/zAMVS7S3CU
Will AI ever take over humanity? 🤔
We’ve got a theory about this in our Social Media team, but let’s double check with an actual expert. Introducing our Director of Science, @VittoFerrariCV, who recently joined Synthesia!
And btw, he’s already looking for new hires, here: https://t.co/PEisdpyOZA
Three papers accepted to #NeurIPS
3/3
NAVI: a dataset of image collections of objects, along with high-quality 3D object scans, near-perfect 2D-3D alignments, and accurate camera parameters.
https://t.co/wNnjHzYsJl
https://t.co/bvJY3SDx44
With @jampani_varun, @kmaninis, others
Three papers accepted to #NeurIPS
2/3
"Estimating Generic 3D Room Structures from 2D Annotations"
3D room layouts annotations for 2246 videos (part of CAD-Estate dataset).
https://t.co/huVIkw0fGX
https://t.co/eflyzSzKz8
With @DRozumnyi,@StefanPopovCV, @kmaninis, @MattNiessner
Three papers accepted to #NeurIPS
1/3
StoryBench: a new benchmark for text-to-video generation of stories to guide progress in assistive technology for filmmaking 🧑🎨
https://t.co/xB12UOwk7C
https://t.co/Di5cUZN7BF
https://t.co/hi80AMJ0pg
With @ebugliarello, @hhm, many others
Wouldn’t it be cool if AI could help us generate movies?🎬
We built a new benchmark to measure progress in this direction🍿
“StoryBench: A Multifaceted Benchmark for Continuous Story Visualization”
📄 https://t.co/2lFhsMm1R3
👩💻 https://t.co/KkpJIn3kmP
📈 https://t.co/kV0EZE9aD3
Our R&D team just got a major boost - please welcome @VittoFerrariCV, our new Director of Science! 👋
He joins us from Google, where he was a Principal Scientist leading research in computer vision and machine learning. Before that, he built and led teams at ETH Zurich and the University of Edinburgh.
Get a glimpse into Vittorio's extraordinary work, his role at Synthesia, and his vision for AI video here: https://t.co/sHXxJ2N94k
Check out CAD-Estate: a large dataset with 3D object and room layout annotations on RGB videos of complex multi-object scenes (101k objects in total!).
https://t.co/eflyzSzKz8
https://t.co/e8320RTvDy
https://t.co/huVIkw0fGX
With @StefanPopovCV, @kmaninis, @MattNiessner
Wouldn’t it be cool if AI could help us generate movies?🎬
We built a new benchmark to measure progress in this direction🍿
“StoryBench: A Multifaceted Benchmark for Continuous Story Visualization”
📄 https://t.co/2lFhsMm1R3
👩💻 https://t.co/KkpJIn3kmP
📈 https://t.co/kV0EZE9aD3
We released our new “Encyclopedic VQA” dataset, which contains visual questions about detailed properties of fine-grained categories (1M VQA triplets total!). These pose a hard challenge for large foundation models.
https://t.co/UfzNIpus8w
https://t.co/75yTCIxBlL
Four papers accepted to #ICCV2023!
2/4
CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
>100k 3D objects annotated on RGB videos of complex scenes
Dataset release coming soon!
https://t.co/e8320RTvDy
@StefanPopovCV, @kmaninis, @MattNiessner