Dhruvesh Patel ✈️ ICML 2026 @_dhruveshp - Twitter Profile

Pinned Tweet

Dhruvesh Patel ✈️ ICML 2026

7 days ago

Variable-length masked diffusion models (FlexMDM and friends) generate by inserting mask tokens into any gap and unmasking them. But the insertion/unmasking schedule is fixed and data-independent. So the model has to learn to produce every sequence in every possible order. For structured data that's a huge waste of capacity. How do you learn data-dependent insertion and unmasking orders without breaking tractable training? We propose LoFlexMDM, which does exactly that. 🧵👇

_dhruveshp's tweet photo. Variable-length masked diffusion models (FlexMDM and friends) generate by inserting mask tokens into any gap and unmasking them. But the insertion/unmasking schedule is fixed and data-independent.

So the model has to learn to produce every sequence in every possible order. For structured data that's a huge waste of capacity.

How do you learn data-dependent insertion and unmasking orders without breaking tractable training? We propose LoFlexMDM, which does exactly that. 🧵👇

2

8

3

0

373

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

4 days ago

@HuLillian39250 Great work! Thanks for sharing. Do you also evaluate on conditional generation tasks?

1

0

15

_dhruveshp retweeted

Haw-Shiuan Chang @Haw_Shiuan

8 days ago

Where does the data flywheel⚙️♻️ of LLM service providers come from? 🚨Our latest paper shows that it could come from your mouse🖱️ and eyes👀! With Jeffrey Gomez, @mehulpatwari_ , Aryan Sajith, @HamedZamani [1/N]🧵

Haw_Shiuan's tweet photo. Where does the data flywheel⚙️♻️ of LLM service providers come from?

🚨Our latest paper shows that it could come from your mouse🖱️ and eyes👀!

With Jeffrey Gomez, @mehulpatwari_ , Aryan Sajith, @HamedZamani [1/N]🧵 https://t.co/qcK0gIxctA

2

13

3

4

3K

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

@MILIJOULE Haha. Thank you!

0

17

Who to follow

PhD student @LTIatCMU. Previously, @openhandsdev, @allen_ai, @UWNLP, @Apple, @UCBerkeley; Social Intelligence in language +X.

Trapit Bansal

@TrapitBansal

AI Research @Meta, Founding Member TBD Lab | Previously @OpenAI, co-creator of OpenAI o-series models (thinking in LLMs)

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

Variable-length masked diffusion models (FlexMDM and friends) generate by inserting mask tokens into any gap and unmasking them. But the insertion/unmasking schedule is fixed and data-independent. So the model has to learn to produce every sequence in every possible order. For structured data that's a huge waste of capacity. How do you learn data-dependent insertion and unmasking orders without breaking tractable training? We propose LoFlexMDM, which does exactly that. 🧵👇

2

8

3

0

373

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

This is a joint work with @brozonoyer, Tahira Naseem, Gaurav Pandey, @RamonAstudill12 , and @andrewmccallum. Happy to chat more about the paper at ICML 2026 in Seoul 🗓️ Wed, Jul 8, 2026 • 5:00 PM – 6:45 PM KST | 📍 HALL A #2805

1

2

0

153

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

But why Kumaraswamy CDFs? With a shared `a`, the hazard simplifies so both events share the same shape function and only b_ins, b_unmask set the rate. Under the time-change τ = -log(1 - t^a), the whole thing becomes an exponential race between per position b_ins and b_unmask, with the precedence constraint that insertion fires before unmasking. That buys you closed-form per-position likelihoods and parallel inverse-CDF sampling of event times. No numerical integration.

1

3

1

174

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

📈 On BracketSAFE molecule strings, LoFlexMDM improves the generation quality significantly over FlexMDM for both de novo and fragment-constrained generation. The cost is a small dip in diversity, which is expected since a sharper order means less randomness. Furthermore, the learned order is interpretable: it commits structure first (ring closures, fragment separators), then fills chemistry (atoms, bonds, branches), and decides *where* fragments attach before *which* fragments attach.

_dhruveshp's tweet photo. 📈 On BracketSAFE molecule strings, LoFlexMDM improves the generation quality significantly over FlexMDM for both de novo and fragment-constrained generation. The cost is a small dip in diversity, which is expected since a sharper order means less randomness.

Furthermore, the learned order is interpretable: it commits structure first (ring closures, fragment separators), then fills chemistry (atoms, bonds, branches), and decides *where* fragments attach before *which* fragments attach.

1

0

75

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

But how do we keep the training tractable? We parameterize each position's insertion and unmasking CDFs as Kumaraswamy CDFs, F(t) = 1 - (1 - t^a)^b. Fix the shape `a` to a shared constant, and let the auxiliary network predict the per-token rate parameters b_ins(x), b_unmask(x).

1

2

1

72

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

The trick: separate the order from the content and learn the order purely through per-position target *hazard rates* produced by an auxiliary network. The generator is trained to match the target rates and therefore the order without changing the terminal distribution.

_dhruveshp's tweet photo. The trick: separate the order from the content and learn the order purely through per-position target *hazard rates* produced by an auxiliary network. The generator is trained to match the target rates and therefore the order without changing the terminal distribution. https://t.co/1PPf8Haguq

1

0

55

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 days ago

⏱️ Think of generation as a per-position CTMC with two events: insertion (∅ → mask) at time T_ins, then unmasking (mask → token) at time T_unmask. The unmasking times define the generation order. In FlexMDM these event rates are constant, so probability gets spread across tons of suboptimal orders. We wanted those rates to be learned and data-dependent without breaking tractable training.

1

0

83

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

21 days ago

Do you find autoregressive language models like @AnthropicAI's @claudeai Mythos too slow? Diffusion models are catching up fast! But, just denoising is not sufficient to realize the promise of fast text generation. We (and the models😉) need to think ahead! Check out our preprint👇

Tim G. J. Rudner

@timrudner

21 days ago

What if diffusion models could think ahead instead of being greedy at every step?🤔 We introduce: Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

timrudner's tweet photo. What if diffusion models could think ahead instead of being greedy at every step?🤔 We introduce:

Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

1

39

9

21

4K

0

2

1

231

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

7 months ago

This is a great opportunity! Tim is an amazing mentor and advisor. I highly recommend applying.

Tim G. J. Rudner

@timrudner

7 months ago

I'm so happy to share that I’ll be joining @UofT as an Assistant Professor of Statistical Sciences and Computer Science, with an appointment at the @VectorInst, in 2026! I'm recruiting postdocs and PhD students: https://t.co/FWBh0BiDqP! Please help me spread the word! 🧵(1/5)

timrudner's tweet photo. I'm so happy to share that I’ll be joining @UofT as an Assistant Professor of Statistical Sciences and Computer Science, with an appointment at the @VectorInst, in 2026!

I'm recruiting postdocs and PhD students: https://t.co/FWBh0BiDqP!

Please help me spread the word!

🧵(1/5)

26

372

73

120

40K

0

3

0

130

_dhruveshp retweeted

Chuang Gan

@gan_chuang

7 months ago

ICLR has placed OpenReview in a difficult position, so I want to offer a few words about the OpenReview team working behind the scenes. OpenReview has long been operated at UMass Amherst as a non-profit organization founded by Andrew McCallum. Each year, Andrew must raise more than $2 million to support a 20-person team that provides essential infrastructure for most major conferences. I once asked Andrew what might have been a naïve question: whether he had considered developing a business model for OpenReview, given its prominence and the seemingly obvious opportunities. He pushed back, explaining that everything he has done for OpenReview is driven by a commitment to serve and strengthen the academic community. He is willing to devote significant personal effort to ensure the platform remains freely accessible to all. We should not blame such a brilliant and dedicated team for an accidental issue. Otherwise, fewer people would be willing to shoulder this kind of responsibility in the future. Deep respect to the OpenReview team! I’m grateful for their work and happy to support in any way!

27

987

135

85

178K

_dhruveshp retweeted

Benjamin Rozonoyer @brozonoyer

over 1 year ago

Excited to present our NeurIPS paper "Learning Representations for Hierarchies with Minimal Support" at the morning poster session on December 12! Stop by at poster #3500 in the East Building!

brozonoyer's tweet photo. Excited to present our NeurIPS paper "Learning Representations for Hierarchies with Minimal Support" at the morning poster session on December 12! Stop by at poster #3500 in the East Building! https://t.co/PbooCgRA8v

6

4

3

1

352

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

over 1 year ago

@su_lin_liu Great work! The formulation makes perfect sense. In the case of text generation, how would you compare the inference time cost of your approach, which has two forward passes per step, to the vanilla mask denoiser?

1

0

16

Dhruvesh Patel ✈️ ICML 2026

@_dhruveshp

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users