Daisuke OBA @dai0NLP - Twitter Profile

Pinned Tweet

5 months ago

Two papers accepted to #ICLR2026 🇧🇷 (1 first, 1 second author) Huge thanks to my co-authors and collaborators! @Bollegala @MasahiroKaneko_ @chokkanorg @junpeikomiyama @stillpedant More details soon!

dai0NLP's tweet photo. Two papers accepted to #ICLR2026 🇧🇷 (1 first, 1 second author)
Huge thanks to my co-authors and collaborators! @Bollegala @MasahiroKaneko_ @chokkanorg @junpeikomiyama @stillpedant
More details soon! https://t.co/Jfad5wNKSl

1

47

8

2

6K

dai0NLP retweeted

Sundar Pichai

@sundarpichai

1 day ago

DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!

165

3K

372

527

254K

Daisuke OBA

@dai0NLP

2 days ago

7/ Takeaway: drifting can refine discrete diffusion LMs when feature-space drift is connected to categorical logits through a soft-token interface. Paper: https://t.co/3lKdvrsVMd w/ @frt03_ @chokkanorg

0

2

1

140

Daisuke OBA

@dai0NLP

2 days ago

1/ New preprint: Drifting Objectives for Refining Discrete Diffusion Language Models Can drifting be used beyond continuous generators? We study this in the setting of refining pretrained discrete diffusion language models (DDLMs). Our method, TokenDrift, provides a differentiable soft-token interface that lets feature-space drifting signals update categorical token logits. Main observation: Gen.-PPL improves throughout drifting training at fixed denoising budgets.

dai0NLP's tweet photo. 1/ New preprint: Drifting Objectives for Refining Discrete Diffusion Language Models

Can drifting be used beyond continuous generators?

We study this in the setting of refining pretrained discrete diffusion language models (DDLMs). Our method, TokenDrift, provides a differentiable soft-token interface that lets feature-space drifting signals update categorical token logits.

Main observation: Gen.-PPL improves throughout drifting training at fixed denoising budgets.

2

23

6

14

2K

Who to follow

SHITO RYO

@Dlexus4

Hitotsubashi Komachi lab D2 | 日本語大好きマン | Zh Ja En

Hiroyuki Deguchi

@de9uch1_

Decoding, Machine Translation, kNN @ NTT CS lab ex. NAIST, NICT, Mantra, NII/LLMC, Ehime Univ. Gentoo / LISP / Rust / bebop jazz pianist

Yosuke Kishinami

@KishinamiYosuke

東北大学情報科学研究科(Tohoku NLP Lab @tohoku_nlp) 修士卒

Daisuke OBA

@dai0NLP

2 days ago

6/ The soft-token part matters. A straight-through hard-token variant still has a surrogate gradient path, but performs much worse and suffers severe entropy collapse. So differentiability alone is not enough: the feature encoder needs to see the model's uncertainty through probability-weighted embeddings (pE).

dai0NLP's tweet photo. 6/ The soft-token part matters.

A straight-through hard-token variant still has a surrogate gradient path, but performs much worse and suffers severe entropy collapse.

So differentiability alone is not enough: the feature encoder needs to see the model's uncertainty through probability-weighted embeddings (pE).

1

2

1

0

131

dai0NLP retweeted

Yukito Tajima @TitaniumJely

14 days ago

GPT-OSS-Swallow v0.1 の MXFP4 版を公開しました。 GPT-OSS-Swallow を、より少ないメモリで動かせるようにするための追加リリースです。これにより、これまで動作環境の制約で試しづらかった場合にも、利用しやすくなります。 https://t.co/qwy0nTUB2q

2

25

11

5

2K

dai0NLP retweeted

Masanari Oi @stjohn2007

about 1 month ago

We propose HATCH🐣, a human-inspired training framework for multi-image spatial reasoning in VLMs 🐤 HATCH improves multi-image spatial reasoning ability while preserving single-image reasoning capabilities 🐓 📚️https://t.co/02Ry5iGmn3

stjohn2007's tweet photo. We propose HATCH🐣, a human-inspired training framework for multi-image spatial reasoning in VLMs 🐤

HATCH improves multi-image spatial reasoning ability while preserving single-image reasoning capabilities 🐓

📚️https://t.co/02Ry5iGmn3 https://t.co/qNCZ8sgbRd

0

23

6

2

2K

dai0NLP retweeted

Masanari Oi @stjohn2007

about 1 month ago

Two first-author papers accepted to #ICML2026 🇰🇷 ! - Human-like multi-image spatial reasoning in multimodal LLMs (@silviasetitech @sponddd @dai0NLP Prof. Inoue @chokkanorg) - Autoregressive direct preference optimization (Mahiro Ukai @MasahiroKaneko_ @chokkanorg Prof. Inoue)

stjohn2007's tweet photo. Two first-author papers accepted to #ICML2026 🇰🇷 !

- Human-like multi-image spatial reasoning in multimodal LLMs (@silviasetitech @sponddd @dai0NLP Prof. Inoue @chokkanorg)
- Autoregressive direct preference optimization (Mahiro Ukai @MasahiroKaneko_ @chokkanorg Prof. Inoue) https://t.co/7TWZZRqaJn

1

95

20

9

22K

dai0NLP retweeted

Sora Miyamoto @SoraMiyamo0831

about 1 month ago

Our paper accepted to #ICML2026 🇰🇷(first author)! This paper is on budget-aligned test-time scaling of LLMs. It is my first ML conference paper! Huge thanks to my co-authors ! @dai0NLP @chokkanorg Preprint: https://t.co/qPvJFHjxMC More details soon!

SoraMiyamo0831's tweet photo. Our paper accepted to #ICML2026 🇰🇷(first author)!
This paper is on budget-aligned test-time scaling of LLMs.
It is my first ML conference paper!
Huge thanks to my co-authors ! @dai0NLP @chokkanorg

Preprint: https://t.co/qPvJFHjxMC
More details soon! https://t.co/fV4pA5YpT7

0

74

11

7

6K

Daisuke OBA

@dai0NLP

about 2 months ago

Also at #ICLR2026 🇧🇷: Presenting Best-of-∞ on behalf of lead author @jkomiyama_ — principled Bayesian stopping that approximates the N→∞ majority-voting limit, plus optimal LLM-ensemble weights via MILP! 🕓25th April, 10:30 AM 📍Pavilion 4, #4710 w/ @jkomiyama_ @stillpedant

dai0NLP's tweet photo. Also at #ICLR2026 🇧🇷: Presenting Best-of-∞ on behalf of lead author @jkomiyama_ — principled Bayesian stopping that approximates the N→∞ majority-voting limit, plus optimal LLM-ensemble weights via MILP!

🕓25th April, 10:30 AM
📍Pavilion 4, #4710

w/ @jkomiyama_ @stillpedant https://t.co/c2dXEwDCur

0

21

5

2K

Daisuke OBA

@dai0NLP

about 2 months ago

Excited to present SureLock at #ICLR2026 🇧🇷 — a principled decoding method that locks converged tokens in Masked Diffusion Language Models, cutting 30–50% FLOPs at same quality! w/ @Bollegala @MasahiroKaneko_ @chokkanorg 🕙 Friday, 24th April, 10:30 AM 📍Pavilion 3 (#826)

dai0NLP's tweet photo. Excited to present SureLock at #ICLR2026 🇧🇷 — a principled decoding method that locks converged tokens in Masked Diffusion Language Models, cutting 30–50% FLOPs at same quality!
w/ @Bollegala @MasahiroKaneko_ @chokkanorg

🕙 Friday, 24th April, 10:30 AM
📍Pavilion 3 (#826) https://t.co/u9joyc0y9b

0

35

11

7

4K

dai0NLP retweeted

Prof. Danushka Bollegala

@Bollegala

about 2 months ago

🇧🇷 Excited to present our paper "Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding" at #ICLR2026 in Rio de Janeiro in just two days! 🏖️ https://t.co/eXQb1DWTBU (Friday 24th 10:30-13:00 poster session) Masked Diffusion LMs generate sequences via iterative sampling, but they waste significant compute by repeatedly re-evaluating tokens that have already converged. To fix this, we introduce SureLock 🔒: a method that permanently locks stable tokens during decoding. By caching their attention keys/values and skipping their query projection and feed-forward sublayers, we drastically cut down on redundant computation. 🚀 The result? We achieve a 30–50% reduction in algorithmic FLOPs on LLaDA-8B with virtually no loss in generation quality! If you are attending ICLR, come stop by our presentation! w/ @dai0NLP @MasahiroKaneko_ @chokkanorg @LivUni @AmazonScience code/paper: https://t.co/u5lcUCfI5R

Bollegala's tweet photo. 🇧🇷 Excited to present our paper "Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding" at #ICLR2026 in Rio de Janeiro in just two days! 🏖️
https://t.co/eXQb1DWTBU (Friday 24th 10:30-13:00 poster session)

Masked Diffusion LMs generate sequences via iterative sampling, but they waste significant compute by repeatedly re-evaluating tokens that have already converged.

To fix this, we introduce SureLock 🔒: a method that permanently locks stable tokens during decoding. By caching their attention keys/values and skipping their query projection and feed-forward sublayers, we drastically cut down on redundant computation.

🚀 The result? We achieve a 30–50% reduction in algorithmic FLOPs on LLaDA-8B with virtually no loss in generation quality!

If you are attending ICLR, come stop by our presentation! w/ @dai0NLP @MasahiroKaneko_ @chokkanorg

@LivUni @AmazonScience

code/paper: https://t.co/u5lcUCfI5R

0

33

7

10

3K

dai0NLP retweeted

Taishi Nakamura

@taishinakamura_

4 months ago

Qwen3-Swallow と GPT-OSS-Swallow モデルを公開しました。 RL学習の担当をしました。強化学習の段階においても、日本語タスクの性能改善が見られています。

1

155

29

33

21K

dai0NLP retweeted

Koshiro Saito @koshiro_sa110

4 months ago

We are thrilled to announce the release of GPT-OSS Swallow and Qwen3 Swallow 🎉 I was involved in evaluation, framework development, and mentoring as a student leader. Leaderboard: https://t.co/CxhlRA2EIO Swallow-Evaluation-Instruct: https://t.co/OI75Q40ro8

0

20

8

0

7K

dai0NLP retweeted

Naoaki Okazaki @chokkanorg

4 months ago

📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。継続事前学習＋SFT＋強化学習を全面刷新し、日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: https://t.co/tTRVGHnF4M GPT-OSS Swallow: https://t.co/L6a2zCjc7i

13

1K

341

741

238K

dai0NLP retweeted

Prof. Danushka Bollegala

@Bollegala

5 months ago

Two papers accepted to @ICLR 2026 🎉Congrats and kudos to my amazing collaborators. @dai0NLP @MasahiroKaneko_ @chokkanorg T.Yamamoto R. Kumon @verypluming One paper on How to make Diffusion Models efficient and the other on proving the existence of culture-specific neurones.

Bollegala's tweet photo. Two papers accepted to @ICLR 2026 🎉Congrats and kudos to my amazing collaborators.
@dai0NLP
@MasahiroKaneko_
@chokkanorg
T.Yamamoto R. Kumon
@verypluming
One paper on How to make Diffusion Models efficient and the other on proving the existence of culture-specific neurones. https://t.co/ssIEVQN0o6

0

28

4

3

3K

Daisuke OBA

@dai0NLP

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users