This was a great collaboration with @ShuangqiLi, @ MathieuSalzmann, @pafrossard ๐
๐ป Code: https://t.co/eht4nNiXng
๐ Paper: https://t.co/jBG9HpxzJt
๐ 9/9
SFT's token-by-token imitation can overfit to fixed demonstrations ๐, raising a key question: should every token be trusted equally?
We introduce PriFT โ๏ธ, an SFT framework using a frozen pretrained model ๐ง to guide token reweighting and align SFT with prior knowledge.
๐ 1/9
If youโre interested in weight interpolation / extrapolation, A few great related reads that inspire me:
โข Rewarded Soups: https://t.co/dXAlncR6b5
โข ExPO / Model Extrapolation: https://t.co/NjHUs8rpLq
โข AlphaRL: https://t.co/9XF5UeANx5
I learnt a lot from these works!
Attending #ICML2026 in Seoul? ๐ฐ๐ท๐ค
If itโs your first time in Korea, Iโve put together a (personal) mini-guide to help you navigate and make the most of your trip! ๐บ๏ธโจ
Check it out here: https://t.co/IGWFOkhyLN ๐:)
Okay, even more interesting : DiffusionGemma is a โloopholedโ diffusion model!
Discrete diffusion usually hits the sampling wall:
the model has a rich distribution over tokens,โจthen at each step, sampling crushes it into one hard token.
A lot of previously computed belief disappears. But DiffusionGemma keeps the previous logits alive.
So it denoises from the tokenโจAND from the belief behind the token.
Thatโs the idea behind ยซย Loopholed Discrete Diffusionยป, a paper I was playing with this week. Exciting to see this at scale !
Binder design has come of age thanks to generative modelsโbut how can we access the wider array of dynamic, multistate protein functions, so elegantly employed by nature?
@mihirbafna14 and I are excited to share SwitchCraft, a framework for designing such functions. (1/7)
Welcome to check CoFRe ๐ช - a complete training-to-inference framework for fixed point masked generation !
Improved quality v.s. cost tradeoff for both text and visual data.
Amazing work leaded by @andreamiele_
๐ฅ New paper: Fixed-Point Masked Generative Modeling
Masked generative models are becoming a very exciting alternative to autoregressive generation, especially for language.
They decode in parallel, but every denoising step still runs a full bidirectional Transformer.
We make them cheaper and stronger with fixed-point denoisers ๐งต
w/ @qinym710@AlbaCbCs@jdeschena and @pafrossard
(1/12)
Looking forward to this workshop on ML4molecules at the ELLIS unconference (followed by EurIPS). Please submit your abstracts! The deadline will be extended to 15 October 2025.
Bit late for the announcements but very happy to share that MEMOIR is accepted to Neurips 2025๐! Great collaboration with @qinym710@nikdimitriadis, @alesfav, @pafrossard! See you in San diego!
๐ Thrilled to share:ย our paper FANTOM with Prof. @pafrossard, Flow-based approach for Dynamic Temporal Causal models with non-Gaussian or Heteroscedastic Noises, has been accepted at NeurIPS 2025! (1/6)
๐ Presenting #DeFoG: our discrete flowโmatching framework for graph generation! Catch our #ICML2025 oral presentation today (3:30โฏโโฏ3:45โฏPM, in West Exhibition Hall C) and drop by the poster right after (4:30โฏโ7:00).
Come chat graphs & generative models! @manuelmlmadeira
๐จPreprint alert!๐จ Did you know there is a new reasoning benchmark where leading models like o3 still fall flat?
(i.e. 0% accuracy or random perf. on hard sub-tasks)
โจMeet ๐๐๐๐๐๐, a ๐ชจ-hard benchmark for multimodal reasoning and planning under complex spatial constraints!โจ
Inspired by well-known challenges such as the ARC challenge, we thought:
๐๐ข๐ฏ ๐ธ๐ฆ ๐ฅ๐ฆ๐ท๐ช๐ด๐ฆ ๐ข ๐ฏ๐ฆ๐ธ ๐ค๐ฉ๐ข๐ญ๐ญ๐ฆ๐ฏ๐จ๐ช๐ฏ๐จ ๐ฃ๐ฆ๐ฏ๐ค๐ฉ๐ฎ๐ข๐ณ๐ฌ ๐ง๐ฐ๐ณ ๐๐๐๐๐ด ๐ต๐ฐ ๐ฑ๐ถ๐ต ๐ต๐ฉ๐ฆ๐ช๐ณ ๐ข๐ฃ๐ช๐ญ๐ช๐ต๐บ ๐ต๐ฐ ๐ณ๐ฆ๐ข๐ด๐ฐ๐ฏ ๐ข๐ฏ๐ฅ ๐ฎ๐ถ๐ญ๏ฟฝ๏ฟฝ๏ฟฝ๐ช-๐ด๐ต๐ฆ๐ฑ ๐ฑ๐ญ๐ข๐ฏ๐ฏ๐ช๐ฏ๐จ ๐ต๐ฉ๐ณ๐ฐ๐ถ๐จ๐ฉ ๐ค๐ฐ๐ฎ๐ฑ๐ญ๐ฆ๐น ๐ฎ๐ถ๐ญ๐ต๐ช๐ฎ๐ฐ๐ฅ๐ข๐ญ ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ๐ด ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ต๐ฆ๐ด๐ต?
Turns out this is still super hard even for latest models!
MARBLE offers 2000+ multimodal-reasoning problems split into two domains:
๐-๐๐จ๐ซ๐ญ๐๐ฅ: multi-step spatial-planning puzzles modelled on levels from Portal 2.
๐-๐๐ฎ๐๐ : here, models need to plan 3D cube assemblies from jigsaw pieces, inspired by Happy Cube puzzles.
MARBLE provides a tough benchmark for testing advanced reasoning in MLLMs. For these tasks, the genie is out of the bottle nowโexpect to see rapid improvements in model performance over the coming months, so let's not over-interpret them by that time!
๐ Website: https://t.co/N1cIXQFm0g
๐ Paper: https://t.co/Yc6YBgjE8f
๐ป Code: https://t.co/UYnh0xDOyW
๐ค Dataset: https://t.co/XJsjKDtfzc
Fun collab with @YulunJiang @ychai1224 @mariabrbic!
๐งฌ New roadmap out in Nature Reviews Molecular Cell Biology!
๐ค We show how RNA-LMs + GNNs can come together to model the RNA interactome & uncover new roles for non-coding RNA.
๐ Clinical links to RNA therapies for cancer & neuro diseases.
๐ Read it: https://t.co/JICDv1LRd9
@boknilev While we do not have the results of editing MEMIT with all 3000 facts at once, prior work (WISE) evaluated MEMIT with batch sizes up to 100, and even then, the performance was still suboptimal when the total number of edits equals to 100.
How can we inject new knowledge into LLMs without full retraining, forgetting, or breaking past edits?
We introduce MEMOIR ๐โ a scalable framework for lifelong model editing that reliably rewrites thousands of facts sequentially using a residual memory module. ๐ฅ
๐งต1/7
@boknilev Thanks for your interest! Yes, the plot shows sequential edits. We follow the lifelong model editing setting, where each edit is applied immediately upon arrival. This reflects a more practical scenario than batching and editing multiple facts at once.