๐[openings] Iโm hiring postdoctoral researchers to join our @FunAILab at UTN through the Alexander von Humboldt Research Fellowship (@AvHStiftung), via the Henriette Herz Scouting Programme.
As a Henriette Herz Scout, I can nominate outstanding international researchers for this fellowship route. Iโm especially keen to hear from candidates working on multimodal learning, video and image pretraining, and post-training.
Fellows would be hosted in our lab at UTN and work closely with us on these topics.
Key requirements:
* finished your doctoral studies less than 4 years ago or will finish in the next 6 months
* did not live/work in Germany in the last 10 years
* applications from female, trans* and/or non-binary candidates are highly encouraged!
Interested? Please send a short note with your CV, PhD year, current affiliation, 2โ3 key publications, and a few lines on how your work connects.
Please share! ๐
Are all videos worth the same number of tokens? Whether rich in motion or visually minimal, standard 3D-grid tokenizers treat them equally. We present VideoFlexTok, which represents videos using a flexible-length, coarse-to-fine sequence of tokens.
Page: https://t.co/aDbvsz2Arw
Demo: https://t.co/aM0BrPzfSq
Paper: https://t.co/e8g7nXrLCn
1/n
Flow-LLM Blogpost :D https://t.co/0HiyNPJHsk
In the last few weeks, a bunch of work on flows for language came out ๐
That is exciting, because it makes truly parallel text generation feel real: generation where models can keep refining the whole response during inference, instead of committing token by token.
I wrote an intuitive and animated introduction to the area โ why autoregression has a structural ceiling, why discrete diffusion only partly escapes it, and why flows may be the first genuinely parallel alternative.
Here's an overview of the key parts of the blog - and let's chat at #ICLR2026 :)
i delude myself into thinking i can remove the EMA encoder from SSL training (without regularization) every 6 months, and it gives me ~3 weeks of mental illness every time.
Why does contrastive learning produce Gaussian representations?
My colleague - Roy Betser shows itโs not accidental, just geometry + probability. A simple perspective on InfoNCE from our #ICLR2026 oral presentation. Worth a read:
๐ https://t.co/eDBmrnf4Y4
High-resolution image and video generation is hitting a wall because attention in DiTs scales quadratically with token count. But does every pixel need to be in full resolution?
Introducing Foveated Diffusion: a new approach for efficient diffusion-based generation that allocates compute where it matters most.
1/7๐งต
Today we opened the Google AI Center Berlin, a new hub for leading AI researchers and developers from @GoogleDeepMind, @GoogleResearch and @GoogleCloud, plus our partners from politics, business and science that builds on our legacy of research in Germany.
Me, @Pseudomanifold, @olgazaghen, Kavir and @erikjbekkers ask :
What are the limitations of the WL metric, and what is an ๐ช๐ฏ๐ง๐ฐ๐ณ๐ฎ๐ข๐ต๐ช๐ท๐ฆ ๐ฎ๐ฆ๐ต๐ณ๐ช๐ค?
We answer these questions with our ๐๐ฟ๐ฎ๐ฝ๐ต ๐๐ผ๐บ๐ผ๐บ๐ผ๐ฟ๐ฝ๐ต๐ถ๐๐บ ๐๐ถ๐๐๐ผ๐ฟ๐๐ถ๐ผ๐ป
https://t.co/icrYQYt4Uq
so @MistralAI just opened in Zurich and Lausanne lol
With Paris, Zurich (ETH), Lausanne (EPFL), Warsaw (UW), they're sucking like 70% of EU talent, only Tubingen missing
๐ข Phillip Isola @phillip_isola, Saining Xie @sainingxie, and I @zamir_ar are hiring joint postdocs in machine learning with a focus on multimodal learning. What brings us together is our shared interest in multimodality and our intention to move the boundaries of current approaches in this area. Our team has access to substantial compute resources through the Swiss National Supercomputing Centre Swiss AI initiative and our industry partners. The postdocs will work at the intersection of our groups. For now the positions will be based at EPFL with visiting stays at MIT and NYU, and will be co-advised by two or all three of us.
๐ Apply here if interested: https://t.co/4vkUCuPT0Z
๐คซ Something's been brewing in stealth. Our SDK team's side project, codenamed W&B LEET, is being unleashed.
We are releasing a full Terminal UI (TUI) for live, interactive W&B monitoring right in your terminal.
No browser, no internet, no problem.
โจCAMERA READY UPDATEโจ with new cool plots in which we show how we can use our Equivariant Neural Eikonal Solver for path planning in Riemannian manifolds
Check our paper here https://t.co/3QnB40p3tE
And see you at NeurIPS ๐ฅฐ
Bai et al., "Positional Encoding Field"
Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
As promised after our great discussion, @chaitanyakjoshi! Your inspiring post led to our formal rejoinder: the Platonic Transformer.
What if the "Equivariance vs. Scale" debate is a false premise? Our paper shows you can have both.
๐ Preprint: https://t.co/kd8MFiOmuG
1/9