Minsu Kim

@minsuuukim

Postdoc @Mila_Quebec and KAIST | Academic collaborator @LawZero_ | RL post-training, reasoning, safety, AI4Science

South Korea

Joined March 2025

147 Following

224 Followers

82 Posts

minsuuukim retweeted

Anirudh Goyal @anirudhg9119

6 days ago

Variational Walkback: Learning Transition Operator as a Stochastic Recurrent Net https://t.co/RnUU3JYpST Start at real data point deliberately walk away from it by applying the model with increasing noise, then train the same transition operator to walk back toward the data.

minsuuukim retweeted

Minkyu Kim @minkyu1022

12 days ago

🎉 "CatFlow: Co-generation of Slab-Adsorbate Systems via Flow Matching" has been accepted at #ICML2026! We develop flow matching for catalyst design at the all-atom level with a factorized representation. Thanks to @__na__young__ , @honghui_kim , and @sungsoo_ahn_

minkyu1022's tweet photo. 🎉 "CatFlow: Co-generation of Slab-Adsorbate Systems via Flow Matching" has been accepted at #ICML2026!

We develop flow matching for catalyst design at the all-atom level with a factorized representation.

Thanks to @__na__young__ , @honghui_kim , and @sungsoo_ahn_ https://t.co/1dVHesmbLW

minsuuukim retweeted

Sungsoo Ahn @sungsoo_ahn_

13 days ago

We are looking for talented people interested in AI for Science, including ML for molecules, materials, and scientific discovery. If you are interested, please feel free to DM or email me. I am happy to chat and answer any questions.

minsuuukim retweeted

Rohan Paul

@rohanpaul_ai

14 days ago

A 10 million parameter model just outperformed deterministic rivals 3 times its size by doing something regular recursive AI dont do: exploring multiple reasoning paths at the same time. Most AI reasoning models are trapped on a single train of thought, and GRAM ("Generative Recursive Reasoning") is the first to break that by letting the model think in parallel universes simultaneously. The problem is that all existing recursive models are fully deterministic, meaning given the same input they always follow the exact same reasoning path and can never escape a wrong trajectory or discover more than 1 valid answer. GRAM fixes this by injecting learned randomness at each refinement step, so the model samples a slightly different direction each time rather than snapping to 1 fixed next state, which produces a spread of diverse reasoning trajectories. At test time the model runs many of these paths in parallel and selects the best one using a small reward predictor trained alongside the main model, adding a "width" scaling axis on top of the usual "depth" axis of running more recursion steps. On hard Sudoku puzzles, GRAM with 10M parameters hits 97% accuracy versus 87.4% for the best prior recursive model, and with only 20 parallel samples it outperforms every deterministic baseline even at 320 recursion steps. On tasks with many valid answers like N-Queens, deterministic recursive models collapse as the number of solutions grows, while GRAM maintains near-perfect accuracy throughout. The same stochastic framework also acts as a generator: given a blank board, GRAM produces valid Sudoku puzzles 99% of the time using 16 steps, versus 1,000 steps and 55M parameters for the best diffusion baseline at just 91%. --- Paper Link – arxiv. org/abs/2605.19376v1

rohanpaul_ai's tweet photo. A 10 million parameter model just outperformed deterministic rivals 3 times its size by doing something regular recursive AI dont do: exploring multiple reasoning paths at the same time.

Most AI reasoning models are trapped on a single train of thought, and GRAM ("Generative Recursive Reasoning") is the first to break that by letting the model think in parallel universes simultaneously.

The problem is that all existing recursive models are fully deterministic, meaning given the same input they always follow the exact same reasoning path and can never escape a wrong trajectory or discover more than 1 valid answer.

GRAM fixes this by injecting learned randomness at each refinement step, so the model samples a slightly different direction each time rather than snapping to 1 fixed next state, which produces a spread of diverse reasoning trajectories.

At test time the model runs many of these paths in parallel and selects the best one using a small reward predictor trained alongside the main model, adding a "width" scaling axis on top of the usual "depth" axis of running more recursion steps.

On hard Sudoku puzzles, GRAM with 10M parameters hits 97% accuracy versus 87.4% for the best prior recursive model, and with only 20 parallel samples it outperforms every deterministic baseline even at 320 recursion steps.

On tasks with many valid answers like N-Queens, deterministic recursive models collapse as the number of solutions grows, while GRAM maintains near-perfect accuracy throughout.

The same stochastic framework also acts as a generator: given a blank board, GRAM produces valid Sudoku puzzles 99% of the time using 16 steps, versus 1,000 steps and 55M parameters for the best diffusion baseline at just 91%.

---

Paper Link – arxiv. org/abs/2605.19376v1

297

230

16K

minsuuukim retweeted

Sungjin Ahn

@SungjinAhn_

14 days ago

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: https://t.co/JC7EyXYc9Y 🌐 Project page: https://t.co/LRT1dQiWLZ w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

SungjinAhn_'s tweet photo. 🧠We introduce "Generative Recursive Reasoning"!

Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.

Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.

And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).

With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+

📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ

w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)

209

181K

minsuuukim retweeted

Sungjin Ahn

@SungjinAhn_

16 days ago

KAIST AI (College of AI) is hiring! If you are attending ICML 2026 in Seoul and are interested in faculty or postdoc positions at KAIST AI Computing (and CS), feel free to reach out by filling out this short interest form: https://t.co/PBAkgFXaJT We are looking for researchers across broad areas of AI and Computer Science, including ML, NLP, CV, HCI, Systems and more. Please share with anyone who may be interested!

SungjinAhn_'s tweet photo. KAIST AI (College of AI) is hiring!

If you are attending ICML 2026 in Seoul and are interested in faculty or postdoc positions at KAIST AI Computing (and CS), feel free to reach out by filling out this short interest form:

https://t.co/PBAkgFXaJT

We are looking for researchers across broad areas of AI and Computer Science, including ML, NLP, CV, HCI, Systems and more.

Please share with anyone who may be interested!

17K

minsuuukim retweeted

BURKOV

@burkov

18 days ago

This Google/Cambridge ICLR 2026 paper introduces Visual Planning, a novel reinforcement learning framework (VPRL) that enables purely visual reasoning through sequences of images, outperforming text-only planning in visual navigation tasks and offering a promising supplement to language-based reasoning for "vision-first" challenges. Read with an AI tutor: https://t.co/WIiYI4uMg0 PDF: https://t.co/GAOFKXj2qT

burkov's tweet photo. This Google/Cambridge ICLR 2026 paper introduces Visual Planning, a novel reinforcement learning framework (VPRL) that enables purely visual reasoning through sequences of images, outperforming text-only planning in visual navigation tasks and offering a promising supplement to language-based reasoning for "vision-first" challenges.

Read with an AI tutor: https://t.co/WIiYI4uMg0

PDF: https://t.co/GAOFKXj2qT

234

184

16K

minsuuukim retweeted

ChangHao @ChangHao564792d

21 days ago

🚀 Excited to share our new paper: Revisiting DAgger in the Era of LLM-Agents! Training long-horizon LLM agents is hard: 🔸 SFT → covariate shift 🔸 RL → sparse rewards 🔸 On-policy distillation → cold-start failure + needs white-box teacher logits We bring back DAgger to fix all three: on-policy rollouts ✕ dense teacher supervision, no cold-start, fully black-box-teacher compatible. ✨ Results on SWE-bench Verified: 🔹 Our 4B agent hits 27.3%, beating published 8B SWE-agent systems 🔹 Our 8B agent hits 29.8%, surpassing SWE-Gym-32B and within 5 pts of strong 32B agents 📄 Paper: https://t.co/e8TTuh1VWb 🤗 HF Daily: https://t.co/nwVwqaahlq

ChangHao564792d's tweet photo. 🚀 Excited to share our new paper: Revisiting DAgger in the Era of LLM-Agents!

Training long-horizon LLM agents is hard:
🔸 SFT → covariate shift
🔸 RL → sparse rewards
🔸 On-policy distillation → cold-start failure + needs white-box teacher logits

We bring back DAgger to fix all three: on-policy rollouts ✕ dense teacher supervision, no cold-start, fully black-box-teacher compatible.

✨ Results on SWE-bench Verified:
🔹 Our 4B agent hits 27.3%, beating published 8B SWE-agent systems
🔹 Our 8B agent hits 29.8%, surpassing SWE-Gym-32B and within 5 pts of strong 32B agents

📄 Paper: https://t.co/e8TTuh1VWb
🤗 HF Daily: https://t.co/nwVwqaahlq

126

121

26K

minsuuukim retweeted

Moksh Jain @JainMoksh

22 days ago

The scientific process involves collecting informative measurements while effectively allocating limited resources. We developed MaD-Physics, a new benchmark to measure this capability of agents.

Minsu Kim

@minsuuukim

about 1 month ago

@HyeonahKimm @alexhdezgcia Paper: https://t.co/udQYgjtIdW

111

Minsu Kim

@minsuuukim

about 1 month ago

Do we really need to hard-code synthesis routes into the generative process to obtain synthesizable molecules? Our ICML 2026 paper suggests another route. Huge credit to @HyeonahKimm , @alexhdezgcia , Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, @Yoshua_Bengio In S3-GFN, we keep the molecular generator sequence-based, initialize it from a rich SMILES prior, and induce synthesizability through soft distributional post-training. Rather than treating synthesizability as a hard action-space constraint or simply folding it into scalar reward shaping, we maintain positive/negative replay buffers and use a contrastive auxiliary loss to separate synthesizable and unsynthesizable regions in probability space. This gives a simple but flexible way to steer GFlowNet sampling toward high-reward, synthesizable molecules while retaining the benefits of pretrained chemical language models. (1/4)

minsuuukim's tweet photo. Do we really need to hard-code synthesis routes into the generative process to obtain synthesizable molecules?

Our ICML 2026 paper suggests another route.

Huge credit to @HyeonahKimm , @alexhdezgcia , Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, @Yoshua_Bengio

In S3-GFN, we keep the molecular generator sequence-based, initialize it from a rich SMILES prior, and induce synthesizability through soft distributional post-training.

Rather than treating synthesizability as a hard action-space constraint or simply folding it into scalar reward shaping, we maintain positive/negative replay buffers and use a contrastive auxiliary loss to separate synthesizable and unsynthesizable regions in probability space.

This gives a simple but flexible way to steer GFlowNet sampling toward high-reward, synthesizable molecules while retaining the benefits of pretrained chemical language models.
(1/4)

about 1 month ago

103

about 1 month ago

Excited to share that our paper “Active Attacks: Red-teaming LLMs via Adaptive Environments” has been accepted to ICML 2026. Joint work with Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park, and @Yoshua_Bengio . We study automated red-teaming: training an attacker LLM to generate diverse attack prompts that expose failure modes in a victim LLM, then using those prompts to improve safety tuning. Can stronger adaptive attacks make LLMs safer? More below 🧵

Minsu Kim

@minsuuukim

about 1 month ago

3/3 Technically, we combine adaptive environments with soft/off-policy RL and replay training. The result is broader coverage of harmful modes and stronger safety tuning: cross-attack success vs GFlowNet baselines improves from 0.07% to 31.28%, with only ~6% extra compute. Paper: https://t.co/NDIFVTyqTW

111

Minsu Kim

@minsuuukim

about 1 month ago

2/3 Active Attacks makes the red-teaming environment adaptive. After each round, we safety-finetune the victim on discovered attacks. This lowers reward in already-exploited regions, pushing the attacker to search for new vulnerabilities. This creates an active-learning-like, easy-to-hard curriculum.

103

Minsu Kim

@minsuuukim

about 1 month ago

@Yoshua_Bengio 1/3 A key challenge in RL-based red-teaming is diversity. If an attacker finds a few easy high-reward attack modes, it can keep exploiting them. This may give high attack success, but poor coverage of failure modes — exactly what we do not want for robust safety tuning.

106

Minsu Kim

@minsuuukim

about 1 month ago

Empirically, this simple recipe works surprisingly well: S3-GFN achieves high synthesizability (often >95%) while maintaining strong optimization performance across molecular design tasks. So the takeaway is: you may not need to hard-code synthesis routes to generate synthesizable molecules — a strong sequence prior + soft constrained post-training can already go a long way. (4/4)

Minsu Kim

@minsuuukim

about 1 month ago

Our approach, S3-GFN, keeps the generator sequence-based (SMILES) and starts from a rich chemical prior. Instead of baking synthesizability into the MDP as a hard constraint, we induce it through soft distributional post-training. The core mechanism is simple: - maintain positive / negative replay buffers add a contrastive auxiliary loss - suppress unsynthesizable regions while preserving reward-seeking behavior (3/4)

111

Minsu Kim

@minsuuukim

Last Seen Users on Sotwe

Trends for you

Most Popular Users