Guilin Liu

21 days ago

LocateAnything is the #1 trending model on HF. Great work from our interns and collaborators. https://t.co/sosBBe6clY

clem 🤗

@ClementDelangue

22 days ago

So much great work lately from Nvidia, the "King of American Open-source AI"! - Crossed 1,000 total public repositories on @huggingface (820 models, 249 datasets & 57 spaces) & almost 60,000 followers - Current #1 trending model on HF with LocateAnything and #5 trending with PiD - Announced that they're adopting the @linuxfoundation OpenMDW framework - Released Cosmos 3, Omnimodal World Models for Physical AI & Alphamayo 2 Super, an open model for autonomous driving - Announced the release soon of Nemotron 3 & work on Nemotron 4 Thank you @nvidia for all the work you're doing for the ecosystem and open-source AI. Can't wait for the next few months!

ClementDelangue's tweet photo. So much great work lately from Nvidia, the "King of American Open-source AI"!

- Crossed 1,000 total public repositories on @huggingface (820 models, 249 datasets & 57 spaces) & almost 60,000 followers
- Current #1 trending model on HF with LocateAnything and #5 trending with PiD
- Announced that they're adopting the @linuxfoundation OpenMDW framework
- Released Cosmos 3, Omnimodal World Models for Physical AI & Alphamayo 2 Super, an open model for autonomous driving
- Announced the release soon of Nemotron 3 & work on Nemotron 4

Thank you @nvidia for all the work you're doing for the ecosystem and open-source AI. Can't wait for the next few months!

23

444

58

85

106K

0

1

0

100

GuilinL retweeted

Bryan Catanzaro

@ctnzr

22 days ago

Nemotron 3 Ultra is now the best open weight model on https://t.co/EJXiSfWv2O 💚

16

379

40

64

56K

Research Scientist @NVIDIA | Formerly @Google, @Cornell | Views are my own

24 days ago

We presented Parallel Box Decoding which improves both decoding efficiency and localization accuracy for vision-language grounding. Please check out more examples and demo through the project page: https://t.co/okB0s8uSvm

NVIDIA AI

@NVIDIAAI

26 days ago

This #CVPR2026 paper from our research team is trending #1 on @HuggingFace 🤗 Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, “seeing” is only useful if a model can pinpoint where something is fast enough to act. Trained on 138M high-quality samples, LocateAnything decodes bounding boxes in parallel instead of one coordinate at a time, improving localization accuracy while dramatically increasing throughput for visual grounding and detection. Project page: https://t.co/O7JMe8tzFM

56

2K

333

2K

328K

0

136

Who to follow

Yin Cui

@YinCuiCV

Pavlo Molchanov

@PavloMolchanov

Director of Research @NVIDIA

Jun-Yan Zhu

@junyanz89

Assistant Professor at Generative Intelligence Lab @CMU_Robotics @CarnegieMellon. Understanding and creating pixels.

GuilinL retweeted

NVIDIA AI

@NVIDIAAI

about 2 months ago

Meet Nemotron 3 Nano Omni 👋 Our latest addition to the Nemotron family is the highest efficiency, open multimodal model with leading accuracy. 30B parameters. 256K context length. 🧵👇

92

1K

188

507

458K

11 months ago

One of our roles in LLM/VLM research at NVIDIA is to explore effective data recipes for training large-scale models and share them to the public—an area where transparency has been limited, as seen with models like Gemini, GPT-4o, Qwen-VL models etc. The Eagle2 project aligns closely with this mission. In this work, we have openly detailed our findings in curating the datasets to develop a frontier VLM model, and we’re glad to see that the community is finding these contributions valuable.

Andi Marafioti

@andimarafioti

about 1 year ago

The Eagle 2 paper from Nvidia is such a goldmine.

1

296

26

297

20K

0

17

1

3

1K

GuilinL retweeted

11 months ago

I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.

1

19

3

11

6K

GuilinL retweeted

Andi Marafioti

@andimarafioti

about 1 year ago

The Eagle 2 paper from Nvidia is such a goldmine.

1

296

26

297

20K

GuilinL retweeted

NVIDIA AI Developer

@NVIDIAAIDev

about 1 year ago

🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard. Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU. 📗 Get the technical details on the newest Nemotron model ➡️ https://t.co/C03WxEBoNG 📝 Try out the NVIDIA NIM ➡️ https://t.co/DOrJdYDbMG

NVIDIAAIDev's tweet photo. 🥇Our NVIDIA Llama Nemotron Nano VL model is #1 on the OCRBench V2 leaderboard.

Designed for advanced intelligent document processing and understanding, this model extracts diverse info from complex documents with precision, all on a single GPU.

📗 Get the technical details on the newest Nemotron model ➡️ https://t.co/C03WxEBoNG

📝 Try out the NVIDIA NIM ➡️ https://t.co/DOrJdYDbMG

2

237

47

89

23K

GuilinL retweeted

Rohan Paul

@rohanpaul_ai

about 1 year ago

Cool paper from @nvidia Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization. Nemotron-Research-Tool-N1 uses rule-based reinforcement learning. It trains models with binary rewards evaluating only tool call structure and correctness, enabling autonomous reasoning. 📌 Binary format and correct tool call reward teaches autonomous reasoning over imitation. 📌 Binary rule-based reward prevents reward hacking, boosting real-world generalization (80.38 percent Live BFCL). 📌 Using binary rewards on structure and tool call leverages SFT data without detailed reasoning steps. ---------- Methods Explored in this Paper 🔧: → The model uses a structured reasoning and action output format. → A binary reward checks adherence to this format and exact match of parsed tool calls to ground truth. → Training uses the Generalized Reinforcement Policy Optimization GRPO algorithm on processed datasets. → Nemotron-Research-Tool-N1-7B achieved 84.82 percent accuracy on BFCL and 81.28 percent on API-Bank, outperforming GPT-4o. ------------ Paper - arxiv .org/abs/2505.00024v1 Paper Title: "Nemotron-Research-Tool-N1: Tool-Using Language Models with Reinforced Reasoning"

rohanpaul_ai's tweet photo. Cool paper from @nvidia

Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization.

Nemotron-Research-Tool-N1 uses rule-based reinforcement learning.

It trains models with binary rewards evaluating only tool call structure and correctness, enabling autonomous reasoning.

📌 Binary format and correct tool call reward teaches autonomous reasoning over imitation.

📌 Binary rule-based reward prevents reward hacking, boosting real-world generalization (80.38 percent Live BFCL).

📌 Using binary rewards on structure and tool call leverages SFT data without detailed reasoning steps.

----------

Methods Explored in this Paper 🔧:

→ The model uses a structured reasoning and action output format.

→ A binary reward checks adherence to this format and exact match of parsed tool calls to ground truth.

→ Training uses the Generalized Reinforcement Policy Optimization GRPO algorithm on processed datasets.

→ Nemotron-Research-Tool-N1-7B achieved 84.82 percent accuracy on BFCL and 81.28 percent on API-Bank, outperforming GPT-4o.

------------

Paper - arxiv .org/abs/2505.00024v1

Paper Title: "Nemotron-Research-Tool-N1: Tool-Using Language Models with Reinforced Reasoning"

4

193

45

136

16K

GuilinL retweeted

Shaokun Zhang

@ShaokunZhang1

about 1 year ago

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: https://t.co/HCeMBaIE7f 💻 Code: https://t.co/4ql0gn71qK (Please consider giving us a ⭐️ to stay updated on the upcoming code release!) 🧠 Why this matters: Existing tool-call models rely heavily on supervised reasoning traces from stronger models—costly, brittle, and often imitative. We ask: Can LLMs learn to reason directly from tool success signals? 📦 What we did: – Train Qwen2.5-7B/14B with simple binary reward on tool-call correctness + reasoning format in R1-style – No reasoning traces needed – Evaluate on BFCL, API-Bank, and ACEBench – Also study the role of SFT, RL, and widely adopted SFT-then-RL recipes in training Tool-Calling models. 📈 Key findings: – Tool-N1-7B/14B obviously outperform GPT-4o and open baselines on all benchmarks – Widely adopted SFT+RL paradigm doesn’t necessarily lead to better performance than Pure RL. – Binary reward > fine-grained reward, esp. for real-world queries – Scaling works: bigger = better gains under our RL setup 🌟 Takeaway: Reasoning doesn’t have to be taught. With just a binary signal, LLMs can learn to reason and act. Tool-N1 sets a new direction for scalable, supervision-light tool calling model training

ShaokunZhang1's tweet photo. Tool-using LLMs can learn to reason—without reasoning traces.

🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation.

📄 Paper: https://t.co/HCeMBaIE7f
💻 Code: https://t.co/4ql0gn71qK

(Please consider giving us a ⭐️ to stay updated on the upcoming code release!)

🧠 Why this matters:

Existing tool-call models rely heavily on supervised reasoning traces from stronger models—costly, brittle, and often imitative. We ask:
Can LLMs learn to reason directly from tool success signals?

📦 What we did:

– Train Qwen2.5-7B/14B with simple binary reward on tool-call correctness + reasoning format in R1-style
– No reasoning traces needed
– Evaluate on BFCL, API-Bank, and ACEBench
– Also study the role of SFT, RL, and widely adopted SFT-then-RL recipes in training Tool-Calling models.

📈 Key findings:

– Tool-N1-7B/14B obviously outperform GPT-4o and open baselines on all benchmarks
– Widely adopted SFT+RL paradigm doesn’t necessarily lead to better performance than Pure RL.
– Binary reward > fine-grained reward, esp. for real-world queries
– Scaling works: bigger = better gains under our RL setup

🌟 Takeaway:

Reasoning doesn’t have to be taught. With just a binary signal, LLMs can learn to reason and act.
Tool-N1 sets a new direction for scalable, supervision-light tool calling model training

2

356

94

281

41K

about 1 year ago

Project Page: https://t.co/JTjIdrsjt5 Arxiv: https://t.co/VUxKlz0d1P

0

130

about 1 year ago

Eagle2.5 natively supports long-context without using any compression module. Eagle2.5-8B has: • got 6 out of 10 SOTA on long video benchmarks • beat GPT-4o (0806) on 3/5 video tasks • beat Gemini 1.5 Pro on 4/6 video tasks • got SOTA result on Hour-long video benchmark.

AK

@_akhaliq

about 1 year ago

Nvidia just dropped Eagle 2.5 Boosting Long-Context Post-Training for Frontier Vision-Language Models

2

190

35

54

21K

2

7

2

0

2K

GuilinL retweeted

about 1 year ago

Thank you AK! Excited to introduce Eagle 2.5, NVIDIA’s latest vision-language model that brings strong long-context capabilities across both image and video understanding — all with just 8B parameters. Most existing VLMs struggle with high-res inputs and long video contexts. Eagle 2.5 is designed to tackle both — supporting up to 512 video frames and trained jointly on image + video data. We introduce a new benchmark-scale dataset, Eagle-Video-110K, with over 110K annotated samples, including QA, localization, and summarization. Videos range from a few minutes to 3 hours — pushing the limits of long-form visual reasoning. Key techniques: • Information-First Sampling: spatially aware, quality-preserving frame selection • Mixed image-video training for generalization • Progressive long-context recipes up to 128K tokens • Optimized decoding and inference for efficient deployment Strong results across the board: • 6 out of 10 SOTA on long video benchmarks • Outperforms GPT-4o (0806) on 3/5 video tasks • Outperforms Gemini 1.5 Pro on 4/6 video tasks • Matches or beats Qwen2.5-VL-72B on multiple key datasets • Strong image understanding with consistent improvement over Eagle 2, matching Qwen2.5-VL. Evaluated on: • Video-MME • MVBench • Charades-STA • 1-Hour Video QA • EgoSchema • MLVU, LVBench, and more… These tasks stress-test long-form visual understanding with dense supervision and temporal reasoning. Model, demo, and dataset will be released soon. Explore the project here: https://t.co/084U086jR0 Code: https://t.co/jGIQU45YBT Tech Report: https://t.co/w1hGgJMwAw We're excited to contribute toward long-context, general-purpose VLMs — and would love to hear your feedback or ideas for collaboration.

1

54

8

14

19K

GuilinL retweeted

Jim Fan

@DrJimFan

over 1 year ago

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

94

2K

387

833

466K

GuilinL retweeted

over 1 year ago

Mr. @pmddomingos This is a country whose leader blatantly says "We lied, we cheated, we stole… we had entire training courses." And thus there's conceited clown like you to spread China hate everywhere. Your self-imagined star-spangled awesomeness doesn't change the fact that Chinese researchers have become a major force in the AI community. China is also leading in industry general autonomy, robotics and AI applications. Your word can't change this fact and the successes don't come with fraud. If you think there’s a problem with this, there’s a problem with you.

9

214

10

31K

GuilinL retweeted

almost 2 years ago

Thank you AK! @_akhaliq This is just a beginning of a long journey, as we focused more on the model design space with multi-encoders, and fair comparisons under controlled settings. More will come in future versions! 🧵[1/n] Try our model & demo: GitHub: https://t.co/ps3KCQGPZD HuggingFace: https://t.co/VfMpC7cSLB Report: https://t.co/KYbHnSxh0D

ZhidingYu's tweet photo. Thank you AK! @_akhaliq

This is just a beginning of a long journey, as we focused more on the model design space with multi-encoders, and fair comparisons under controlled settings. More will come in future versions! 🧵[1/n]

Try our model & demo:
GitHub: https://t.co/ps3KCQGPZD
HuggingFace: https://t.co/VfMpC7cSLB
Report: https://t.co/KYbHnSxh0D

1

61

11

19

87K

Rafael Valle @RafaelValleArt

over 2 years ago

We have also worked on transformer-based diffusion (link below) and video diffusion (https://t.co/NDmTTjzS5E). However, we did them in two different projects. :) Congrats to OpenAI for proving scaling-up still works for video synthesis.

0

11

1

3

2K

GuilinL retweeted

over 2 years ago

My co-authors are presenting P-Flow at NeurIPS on Thursday at 5pm! We'd love to chat about generative models, audio synthesis and understanding! We are also hiring, including for internships, researchers with expertise in multimodal LLMs!

2

17

2

5

2K

over 2 years ago

@tariqafridi16 NVIDIA used to have PhD Residency Program before. But it is not active now. We also have internship program. Sometimes we may also extend internship if the project is interesting but not done. For some long-term projects, we may extend the internship longer.

1

0

101