When you prompt an LLM for code, you get one deterministic program. However, the LLM actually defines a distribution over many programs, and existing methods discard it‼️ PPoT uses this distribution to extract free performance and efficiency gains. 🧵👇
We just implemented trai (try + AI) (https://t.co/50VH8V9CgR), a Claude plugin that can help you isolate 🫷 changes done in your file system by tool calls (like pip install), and only commit them if they are intended ☺️. Try it out (pun intended) and share your feedback!
One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models.
However, parallel generation comes with a price. For example:
Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]?
If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦♂️
We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head.
This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.
Check out our recent work offering a principled way to perform parallel prediction (few-step generation) in Diffusion LLMs with minimal performance degradation!
One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models.
However, parallel generation comes with a price. For example:
Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]?
If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦♂️
We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head.
This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.
📢Feb 2 (Mon): Planned Diffusion
🙅Diffusion language models are capable of parallelizing text generation but can struggle with coherence in low time-step regimes.
💡Planned Diffusion unlocks a new axis of parallelism: Token-level parallelism ➡️ semantic parallelism
✍️Planned diffusion first generates a structured plan, then diffuses semantically independent spans of text in parallel according to the plan.
This Monday, Daniel Israel (UCLA) (@danielmisrael) and Tian Jin (MIT) (@jintian) will discuss their exciting Planned Diffusion paper as joint first authors.
Collaborators: Ellie Cheng (https://t.co/qdy0TAVHCD), Guy Van den Broeck (@guyvdb), Aditya Grover (@adityagrover_), Suvinay Subramanian (@suvinay), Michael Carbin (@mcarbin)
Paper link: https://t.co/ptRpLFXHDh
@RealAAAI AAAI2021Conference - Neuro-Symbolic AI Panel, during the COVID-19 crisis. With @kerstingAIML @guyvdb @mattbotvinick Marta Kwiatkowska, Leslie Pack Kaelbling. Just a picture 🙂
Recordings of the NeSy 2025 keynotes are now available! 🎥
Check out insightful talks from @guyvdb , @tkipf and @dlmcguinness on our new Youtube channel.
Topics include using symbolic reasoning for LLM, and object-centric representations
https://t.co/iIHdKTr432
I gave a keynote at @nesyconf on
"Symbolic Reasoning in the Age of Large Language Models"
Check out the recording if you are curious about neurosymbolic generative AI:
https://t.co/VUUJD4vdYB
What happens when we compress the KV cache of prompts with multiple instructions? 🤔
Existing compression methods can lead to some instructions being ignored. 🙀
We propose simple changes to KV cache eviction that fix this problem alongside other pitfalls to be aware of. 💯
"An hour of planning can save you 10 hours of doing."
✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation.
Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.