Jack Lin @jacklin_64 - Twitter Profile

2 months ago

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. https://t.co/lkEwmlUng9

ctnzr's tweet photo. Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF.

The Nemotron team is working hard to incorporate all your feedback into Nemotron 4.

And yes, Nemotron 3 Ultra is still on track for release.

https://t.co/lkEwmlUng9

20

219

39

31

55K

jacklin_64 retweeted

AK

@_akhaliq

3 months ago

Nvidia just released Nemotron-Cascade 2 on Hugging Face paper: https://t.co/ofx6zOlNic model: https://t.co/pgJht4jNE0

7

46

15

10

7K

jacklin_64 retweeted

DailyPapers

@HuggingPapers

3 months ago

NVIDIA just released Nemotron-Cascade 2 on Hugging Face A 30B MoE model with 3B activated parameters that achieves gold medal performance at IMO and IOI 2025.

HuggingPapers's tweet photo. NVIDIA just released Nemotron-Cascade 2 on Hugging Face

A 30B MoE model with 3B activated parameters that achieves gold medal performance at IMO and IOI 2025. https://t.co/kNkYlhpKv8

7

308

38

163

28K

jacklin_64 retweeted

Wei Ping

@_weiping

3 months ago

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 https://t.co/4QJqfTOt6I 📄 Technical report: 👉 https://t.co/dFC00m6RZU

_weiping's tweet photo. 🚀 Introducing Nemotron-Cascade 2 🚀

Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities.

🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025:
• Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B).
• Remarkably high intelligence density with 20× fewer parameters.

🏆 Best-in-class across math, code reasoning, alignment, and instruction following:
• Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11).

🧠 Powered by Cascade RL + multi-domain on-policy distillation:
• Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains.

🤗 Model + SFT + RL data:
👉 https://t.co/4QJqfTOt6I

📄 Technical report:
👉 https://t.co/dFC00m6RZU

40

892

140

534

162K

Who to follow

Xueguang Ma

@xueguang_ma

PhD student at @uwaterloo. Scaling search and reasoning for agents. Prev. intern at @Meta, @MSFTResearch, @amazon

Craig Macdonald

@craig_macdonald

Professor of Information Retrieval

Rodrigo Nogueira

@rodrigfnogueira

Researcher in Deep Learning, Information Retrieval, and NLP

jacklin_64 retweeted

Yangyi Chen

@YangyiChen6666

6 months ago

Super proud to introduce my first work at NVIDIA!! Nemotron-Cascade, our RL scaling efforts to build fully open-source general-purpose reasoning models that achieve SoTA performance on math, coding, and SWE. I am extremely honored to join this small but closely-connected team led by the wonderful @_weiping!

YangyiChen6666's tweet photo. Super proud to introduce my first work at NVIDIA!! Nemotron-Cascade, our RL scaling efforts to build fully open-source general-purpose reasoning models that achieve SoTA performance on math, coding, and SWE.

I am extremely honored to join this small but closely-connected team led by the wonderful @_weiping!

7

127

21

30

7K

Jack Lin @jacklin_64

6 months ago

Check out the first comprehensive study on cascade RL to build general-purpose reasoning models. We also release the training data and the strong 8B 14B General-purpose reasoning models.

Wei Ping

@_weiping

6 months ago

🚀 Introducing Nemotron-Cascade! 🚀 We’re thrilled to release Nemotron-Cascade, a family of general-purpose reasoning models trained with cascaded, domain-wise reinforcement learning (Cascade RL), delivering best-in-class performance across a wide range of benchmarks. 💻 Coding powerhouse After RL, our 14B model: • Surpasses DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro. • Achieves silver-medal performance at IOI 2025 🥈. • Reaches a 43.1% pass@1 on SWE-Bench Verified, and 53.8% with test-time scaling. 🧠 What is Cascade RL? Instead of mixing heterogeneous prompts across domains, Cascade RL trains sequentially, domain by domain, which reduces engineering complexity, mitigates heterogeneous verification latencies, and enables domain-specific curricula and tailored hyperparameter tuning. ✨ Key insight Using RLHF for alignment as a pre-step dramatically boosts complex reasoning—far beyond preference optimization. Subsequent domain-wise RLVR stages rarely hurt the benchmark performance attained in earlier domains and may even improve it, as illustrated in the following figure. 🤗 Models & training data 🔥 👉 https://t.co/wfVcAaMocA 📄 Technical report with detailed training and data recipes 👉 https://t.co/FdMINvB4yM

_weiping's tweet photo. 🚀 Introducing Nemotron-Cascade! 🚀

We’re thrilled to release Nemotron-Cascade, a family of general-purpose reasoning models trained with cascaded, domain-wise reinforcement learning (Cascade RL), delivering best-in-class performance across a wide range of benchmarks.

💻 Coding powerhouse
After RL, our 14B model:
• Surpasses DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro.
• Achieves silver-medal performance at IOI 2025 🥈.
• Reaches a 43.1% pass@1 on SWE-Bench Verified, and 53.8% with test-time scaling.

🧠 What is Cascade RL?
Instead of mixing heterogeneous prompts across domains, Cascade RL trains sequentially, domain by domain, which reduces engineering complexity, mitigates heterogeneous verification latencies, and enables domain-specific curricula and tailored hyperparameter tuning.

✨ Key insight
Using RLHF for alignment as a pre-step dramatically boosts complex reasoning—far beyond preference optimization. Subsequent domain-wise RLVR stages rarely hurt the benchmark performance attained in earlier domains and may even improve it, as illustrated in the following figure.

🤗 Models & training data 🔥
👉 https://t.co/wfVcAaMocA

📄 Technical report with detailed training and data recipes
👉 https://t.co/FdMINvB4yM

11

544

83

297

100K

0

5

1

0

469

jacklin_64 retweeted

Jimmy Lin

@lintool

12 months ago

@yupp_ai @UWaterloo Today marks the beginning of this journey for me, and I’m happy to share more details in the coming months! Until then, I hope you’ll try out https://t.co/61cOJryF5O and share your feedback. (9/9)

3

20

6

2

3K

jacklin_64 retweeted

Xueguang Ma

@xueguang_ma

over 1 year ago

Introducing DRAMA🎭: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers. We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations. With single-stage training, DRAMA achieves strong performance on both English and multilingual retrieval tasks—enabling smaller retrievers to benefit from ongoing LLM advancements.

xueguang_ma's tweet photo. Introducing DRAMA🎭: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers.

We propose to train a smaller dense retriever using a pruned LLM as the backbone, fine-tuned with diverse LLM data augmentations.

With single-stage training, DRAMA achieves strong performance on both English and multilingual retrieval tasks—enabling smaller retrievers to benefit from ongoing LLM advancements.

1

75

21

40

11K

jacklin_64 retweeted

Xueguang Ma

@xueguang_ma

over 1 year ago

In this work led by @ShengyaoZhuang , we explore various settings to attack recent document screenshot retrievers like DSE and ColPali. 🚨What you see might not be what you searched for.

0

10

2

0

619

jacklin_64 retweeted

Victoria X Lin

@VictoriaLinML

over 1 year ago

#NeurIPS2024 I will present "Nearest Neighbor Speculative Decoding for LLM Generation and Attribution" led by @alexlimh23 at the poster session today. ⏰ Thu Dec 12 at 4:30-7:30 PM PST 🏛️ East Exhibit Hall A-C, #2201 🔗 https://t.co/a3Zfuvhfib Please drop by if you would like to chat about semi-parametric language modeling, beyond token-level decoding and generation attribution!

VictoriaLinML's tweet photo. #NeurIPS2024 I will present "Nearest Neighbor Speculative Decoding for LLM Generation and Attribution" led by @alexlimh23 at the poster session today.
⏰ Thu Dec 12 at 4:30-7:30 PM PST
🏛️ East Exhibit Hall A-C, #2201
🔗 https://t.co/a3Zfuvhfib

Please drop by if you would like to chat about semi-parametric language modeling, beyond token-level decoding and generation attribution!

2

64

4

11

8K

Jack Lin @jacklin_64

over 1 year ago

I will present our paper FLAME on factuality alignment for LLMs with @luyu_gao at #NeurIPS2024! 🎉 Join us at East Exhibit Hall A-C, Booth #3501 for a chat on Wed (Dec 11, 4:30--7:30 pm). Looking forward to connecting! More detail: https://t.co/EGuJrexLYq

Xilun Chen @ccsasuke

about 2 years ago

Introducing FLAME🔥: Factuality-Aware Alignment for LLMs We found that the standard alignment process **encourages** hallucination. We hence propose factuality-aware alignment while maintaining the LLM's general instruction-following capability. https://t.co/3ieQDq7wA2

ccsasuke's tweet photo. Introducing FLAME🔥: Factuality-Aware Alignment for LLMs

We found that the standard alignment process **encourages** hallucination. We hence propose factuality-aware alignment while maintaining the LLM's general instruction-following capability.
https://t.co/3ieQDq7wA2 https://t.co/KSiIy59cje

3

35

8

15

7K

0

14

5

3K

jacklin_64 retweeted

Jimmy Lin

@lintool

over 1 year ago

Congratulations to Dr. @jacklin_64 for successfully defending his Ph.D. thesis "Building a Robust Retrieval System with Dense Retrieval Models"! 🎉

lintool's tweet photo. Congratulations to Dr. @jacklin_64 for successfully defending his Ph.D. thesis "Building a Robust Retrieval System with Dense Retrieval Models"! 🎉 https://t.co/ldy2xmsuds

8

119

6

9

10K

jacklin_64 retweeted

Nan Wang

@nanwang_t

over 1 year ago

Crucial work in the field of multimodal embeddings! It’s impressive that multimodal embeddings are reaching SOTA-level performance comparable to text-only embeddings in the retrieval tasks.

1

640

Jack Lin @jacklin_64

over 1 year ago

This project was done while interning at NVIDIA this summer. Big thanks to all the amazing co-authors, @chankyul77 @MohammadShoeybi @lintool @ctnzr and @_weiping

0

1

0

279

Jack Lin @jacklin_64

over 1 year ago

Introducing MM-Embed, the first multimodal retriever achieving SOTA results on the multimodal M-BEIR benchmark and compelling results (among top-5 retrievers) on the text-only MTEB retrieval benchmark. Paper: https://t.co/i4bSsDLlLA 🤗 Model: https://t.co/nSb6fFre08

3

92

24

58

9K

Jack Lin @jacklin_64

over 1 year ago

Finally, for challenging multimodal queries, a free performance boost is possible: prompt multimodal LLMs as zero-shot rerankers.

jacklin_64's tweet photo. Finally, for challenging multimodal queries, a free performance boost is possible: prompt multimodal LLMs as zero-shot rerankers. https://t.co/IydBwLJTq9

1

0

316

Jack Lin @jacklin_64

over 1 year ago

The sky last night was insane! Thanks to Waterloo for this epic aurora show.

0

2

0

244

jacklin_64 retweeted

Raphael Tang

@ralph_tang

over 1 year ago

Our paper on understanding variability in text-to-image models was accepted at #EMNLP2024 main track! Lots of thanks to my collaborators @crystina_z @yaolu_nlp @Wenyan62 @Ulienida and mentors @lintool Pontus @ferhanture. Check out https://t.co/Ul1XpeWdO9

1

25

14

1

3K

Jack Lin

@jacklin_64

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users