Such a wonderful collaboration with @songguanyu0316 under @iamborisi mentorship. ๐๐๐๐ Our paper has been selected for an Oral presentation at #CVPR2024!!!! Extremely proud of our team and looking forward to seeing you all in Seattle!!
Even prouder advisor moment: Our work has been selected for an Oral presentation at CVPR 2024!!! ๐ Looking forward to presenting it in Seattle in June! Check out the paper on arXiv: https://t.co/ocW2t4HIOL
๐ขMinT: Temporally-Controlled Multi-Event Video Generation๐ข
https://t.co/gEnm4DnkAC
TL;DR: We identify a fundamental failure mode of existing video generators: they cannot produce videos with sequential events. MinT unlocks this capability with temporal grounding of events.
๐งต
Looking forward to attending #NeurIPS2024 next week. I will be looking for postdoctoral researchers in Computer Vision, AI, and Robotics. Interested in joining? Feel free to reach out for a chat during the conference.
Check out our work on "Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention" that @XunjiangGu has presented at #ECCV2024 today. 1/n
This week we will be presenting 3 papers @eccvconf, on online mapping (@iamborisi), traffic scenario generation, and VLM-based AV stacks (@ChaoweiX):
Paper 1: https://t.co/fMcYLSIHeL
Paper 2: https://t.co/2dCMpE8CFr
Paper 3: https://t.co/aEqTZ3KHCF
More on online mapping ๐
Excited to share our ECCV 2024 work: Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention ๐๐๐
Online map estimation shows great potential in guiding downstream tasks, as seen in our recent best paper finalist at CVPR 2024. But can we speed this up further by directly using the BEV features rather than processed map information? Check out our new paper!
Paper: https://t.co/mWNBtyFSeb
Code: https://t.co/jq1NJkeKjF [1/6]
Check out our new work (https://t.co/jgBqDTbUur) on reconstructing in-the-wild scenes with any actor. Ziyu is dedicated enough to bring the framework to six datasets (Waymo, PandaSet, Argoverse2, KITTI, nuScenes, nuPlan). We pack everything as an open-source project DriveStudio (https://t.co/tNhwuvewhg). To all my in-the-wild GS friends: there is no excuse to evaluate future work on a single dataset :)
๐ Introducing ๐๐จ๐ฅ๐ ๐บ: a mixture-of-experts video captioning framework that outperforms GPT-4V and Gemini-Pro-1.5 in general scenes ๐ผ๏ธ, autonomous driving ๐, and robotics videos ๐ค.
๐: https://t.co/cOEfUvRL0m
Agent-Driver is accepted to @COLM_conf with top 1% reviews (7, 7, 8, 9) among all submissions. We sincerely thank reviewers for providing super constructive comments and helping us improve this paper. We incorporated all feedback from the rebuttal and released a final version: https://t.co/GRIyRIoLir. Hopefully this 40-page report provides a comprehensive study of LLM agents applied to autonomous driving and facilitates future research ideas. Huge thanks to student leads @PointsCoder@JunjieYe9 and James Qian who spent days and nights improving every single bit of this paper and to @drmapavone for countless support of this project. Most importantly, thank you @COLM_conf organizers for putting together this timely conference and all the efforts of making the review process so smooth!
The new Italian Foundation on Artificial Intelligence for Industry (AI4I) is looking for a Director! A unique opportunity to shape the R&D AI agenda in Italy and in Europe (I serve as member of the scientific committee). Apply here: https://t.co/ewcCMK7tSR
@FabioPammolli#AI
Very excited to get this out: โDVT: Denoising Vision Transformersโ. We've identified and combated those annoying positional patterns in many ViTs. Our approach denoises them, achieving SOTA results and stunning visualizations! Learn more on our website: https://t.co/RFEiZQx7ZZ
How can we best use LLMs in an autonomy stack? An exciting prospect is to exploit their generalist experience to reason about anomalies. And one can do this in real time by leveraging their embeddings in a fast&slow decision making architecture. Work led by @RohanSinhaSU#RSS2024
By directly leveraging BEV features, our proposed methods achieve up to 73% faster system inference speeds and an up to 29% increase in downstream prediction accuracy! We also evaluate the benefits of different BEV encoding strategies. [5/6]
๐ข๐ Super excited to present Spatially Aware Multiview Diffusers (SPAD) at #CVPR2024!
SPAD enables 3D consistent multi-view image generation from text or image inputs. It is trained using a high-quality Objaverse subset on 32 H100s!
Code & Paper links at the end!
๐งต๐
It is my greatest pleasure to work with such a wonderful team and under the supervision of @iamborisi and @igilitschenski. Hope you see you all around in Seattle!!
Excited to share our #CVPR2024 work (oral & award candidate) on integrating map uncertainty into trajectory prediction led by @XunjiangGu.
Our key insight is simple: Scene uncertainties matter for agent trajectories. ๐งต1/n