➡️ For workshop specifics, including the evaluation package, baseline code, and more, visit: https://t.co/dCKaQaDB4V
➡️ Eval server (sign-up now!): https://t.co/dGLqqFvBbo
#ICCV2025
Exciting news! We're happy to announce our challenge / workshop at this year @ICCVConference focusing on Spatiotemporal Action Grounding in Videos. Here are the details:
🔷 Watch the video below for a demo.
🔷 The eval server is open until 09/19!
🔷 Links incl. code below.
#ICCV
🎯 Challenge Launch Announcement
We are pleased to announce the launch of the MOT25 Challenge, to be held in conjunction with ICCV 2025.
🔗 Workshop website:
https://t.co/YGg9wphKnT
🧪 The MOT25 Challenge is now live on Codabench:
https://t.co/63pPdJZrqw
@KyleSargentAI Also, if you ever figure out why someone established that gFID is measured against the train (!) set statistics, while rFID is measured against Val set statistics, let me know!
Soo what alternative can we use on ImageNet?
Heading off to ICLR to present our work on image generation and latent spaces! If you're interested in tokenization or generation drop by at our poster.
Also, if you'd like to chat about any of these topics, feel free to ping me! #ICLR2025
🧵1/9 Happy to share our paper "MaskBit: Embedding-free Image Generation via Bit Tokens" got published in TMLR with featured (aka spotlight) & reproducibility certifications!
I'm especially excited about the disentangled visual concepts in our shared latent space. Details below!
A beautiful article by D. Graham Burnett
https://t.co/nQqK2nPQjw
´The A.I. is huge. A tsunami. But it’s not me. It can’t touch my me-ness. It doesn’t know what it is to be human, to be me.’
‘Historians have long extolled the “power of the archive.” Little did we know that the engineers would come along and plug it in. And it turns out that a huge amount of what we seek from a human person can be simulated through this Frankensteinian reanimation of our collective dead letters.’
Thanks for sharing @mustafasuleyman
@sedielem The disentangled latent space then allows us to directly train the generator network taking the latent bit tokens as input, in contrast to learning a new vocabulary. We found this unified representation to be very efficient and strong for generation.
https://t.co/tLdRWeJi31
@sedielem Thanks for the great post! In our study about training VQGANs, we made an observation that might be of interest to you and your readers.
When using LFQ, our tokenizer is able to model certain visual properties (like exposure, smoothness, color palette) into different channels.
Our exclusive with @JDVance ahead of @MunSecConf: -On Ukraine, he says there will be a good peace deal that will guarantee the country’s long-term sovereignty - and Putin will face sanctions and military measures if he doesn’t play ball.
-On Europe, he will tell mainstream leaders in Munich that they’ve become Soviet-style enemies of free speech and democracy who ignore voters and fail to stop mass migration. Some Germans will be particularly shocked when he calls for ending the firewall against the far-Right @AfD and embracing the populist vote.
With @alexbward via @WSJ https://t.co/xHCmQmTKIz
Everything you love about generative models — now powered by real physics!
Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications.
Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: https://t.co/bEkIlCKqdf).
The Genesis physics engine and simulation platform is fully open source at https://t.co/DhBv7NdyqH. We'll gradually roll out access to our generative framework in the near future.
Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism.
We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications.
Open Source Code: https://t.co/DhBv7NdyqH
Project webpage: https://t.co/SBNyhFB0yn
Documentation: https://t.co/3yuBoaealV
1/n
7/9 MaskBit achieves state-of-the-art performance with up to 1.52 FID on ImageNet 256×256, using just 305M parameters. That's better than prior diffusion and autoregressive models! 🔥
5/9 Our analysis reveals fascinating properties of bit tokens: Most channels appear to capture different visual concepts, making the representation more interpretable! Flipping individual bits leads to systematic changes in attributes like texture, color, and style. 🎨
6/9 For our generation model MaskBit, the key innovation is: We are the first to utilise the SAME (bit) token representation for both tokenizer and generator, unlike prior methods that require separate embedding tables! No need to (re-)learn codebooks anymore!
4/9 Our tokenizer learns a semantically structured latent space! Checkout what happens when we flip bits in each channel. Having a consist visual interpretable latent space could be a gamechanger for control!
3/9 We carefully revisit the VQGAN design and provide a complete, reproducible recipe for building a modern tokenizer. Our VQGAN+ improves reconstruction FID from 7.94 to 1.66 in the low vocabulary, low resolution setting - a huge 6.28 gain! 📈 See chapter 2 for details.