Great news from #CVPR2024 🎉🎉🎉
Happy to share that our paper ElasticDiffusion: Training-free Arbitrary Size Image Generation was accepted @CVPR.
Big thanks to my collaborators @bluevincent and Guha Balakrishnan.
Checkout more details from here: https://t.co/Ch8uelO4Mf
@Peldaaaaaaa @Telegraph Bruh idk what kind of arabic you speak, but we do say that.
Never heard the phrase "jihad one's self"? which means fighting your inner self?
https://t.co/3sACwt5rAa
Happy to announce AlignVLM📏: a novel approach to bridging vision and language latent spaces for multimodal understanding in VLMs! 🌍📄🖼️
🔗 Read the paper: https://t.co/czaL8NrlZL
🧵👇 Thread
Introducing
⚗️ Video Alchemist
Our new video model supporting
👪 Multi-subject open-set personalization
🏞️ Foreground & background personalization
🚀 Without the need of inference-time tuning
https://t.co/0bH4oXleIR
[Results]
1. Sora girl rides a dinosaur on a savanna
🧵👇
Diffusion models are very strong and robust feature extractors, but recent works were only using them for recognition tasks. In our recent work (led by @MoayedHaji), we harness them for video2audio generation: they by far outperform conventional video feature extractors for audio/video temporal alignment and allow to achieve SotA results in sound quality as well: https://t.co/tmmH0oYOjf
Can we use the intermediate representations of pretrained video generation models to generate audio, and vice versa? 🤔🔀
It turns out that this approach can be powerful.
Introducing AV-Link, an approach to connect video and audio diffusion models in a self-contained framework
6/6 Check out qualitative examples in the project website (https://t.co/qrOIyMT9le). and our preprint (https://t.co/SRvkVOL6GM) for more details about the architecture.
1/6 Introducing AV-Link, an approach to connect video and audio diffusion models in a self-contained framework to enable video-to-audio and audio-to-video generation with superb audio-video synchronization.
Project page: https://t.co/fus7YWoyPJ
5/6 For the first time, our approach shows audio-synchronized video generation for “in-the-wild” scenarios, surpassing existing methods in both generation quality and semantic and temporal alignment (https://t.co/w34F1I6mrS).
🧵 1/4 Introducing ColFlor: An Efficient, OCR-Free Vision-Language Document Retrieval Model 🌟
Earlier this year, ColPali transformed document retrieval by removing error-prone OCR pipelines, creating embeddings directly from images. However, its 3 billion parameters make it computationally expensive. That’s where ColFlor comes in! ⚡👇 #AI #NLP
https://t.co/4Hk7FEgPRW
I am excited to share that two of our research works will be presented at ECCV 2024. #ECCV2024 They focus on augmenting language models with fine-grained visual recognition ability.
AutoVER made successful attempts at generative visual recognition. It was accepted to the ECCV 2024 main conference and was invited to the ILR Workshop as an oral presentation. Collaboration w/ @pcascanteb@vislang #Microsoft
Extractive Reranker was accepted to the ILR Workshop as a poster. We explored how the long-context sequence modeling ability of language models can benefit image retrieval, a fundamental computer vision problem.
📢Excited to share our recent work ✂️CLIPAway, a plug-and-play solution to simple object removal using latent diffusion models. It works like a charm!
📄Paper: https://t.co/yYh3TYpohi
🔗Project website: https://t.co/cjElZ8niMq
🤗@huggingface Spaces Demo: https://t.co/BwxO2ZxfWM