Matt Jones @DrewFustin - Twitter Profile

over 1 year ago

Mini-R1: Reproduce @deepseek_ai R1 „aha moment“ a RL tutorial! Recreate an RL "aha moment" using Group Relative Policy Optimization (GRPO) and train an open model using reinforcement learning to teach it self-verification and search abilities all on its own to solve the Countdown Game. TL;DR: 🤯 DeepSeek R1's "aha moment" demonstrates RL's potential for self-improvement in LLMs. 2️⃣ Using 2 reward functions, 1x for format (<think>,<answer>) and 1x for correctness 🤖 Qwen2.5-3B-Instruct model learns self-verification and search abilities. ⚙️ Use @MSFTDeepSpeed and @vllm_project for efficient and distributed online RL Training with @huggingface TRL 🤟 Include Training Observations and Hyperparameter improvements 🧮 Uses Countdown Game (arithmetic puzzles) to teach models self-correction via <think> and <answer> tags 📊 Achieves 50% success rate after 450 training steps on 4x H100 GPUs ⚡ Training takes ~6 hours on 4x H100 GPUs for 450 steps

_philschmid's tweet photo. Mini-R1: Reproduce @deepseek_ai R1 „aha moment“ a RL tutorial! Recreate an RL "aha moment" using Group Relative Policy Optimization (GRPO) and train an open model using reinforcement learning to teach it self-verification and search abilities all on its own to solve the Countdown Game.

TL;DR:
🤯 DeepSeek R1's "aha moment" demonstrates RL's potential for self-improvement in LLMs.
2️⃣ Using 2 reward functions, 1x for format (<think>,<answer>) and 1x for correctness
🤖 Qwen2.5-3B-Instruct model learns self-verification and search abilities.
⚙️ Use @MSFTDeepSpeed and @vllm_project for efficient and distributed online RL Training with @huggingface TRL
🤟 Include Training Observations and Hyperparameter improvements
🧮 Uses Countdown Game (arithmetic puzzles) to teach models self-correction via <think> and <answer> tags
📊 Achieves 50% success rate after 450 training steps on 4x H100 GPUs
⚡ Training takes ~6 hours on 4x H100 GPUs for 450 steps

30

812

150

789

77K

drewfustin retweeted

Rohan Paul

@rohanpaul_ai

over 1 year ago

NetworkX from NVIDIA is one THE most popular Python graph analytics library with ~15K Github starts and 80M downloads monthly. This library is for working with networks and graphs. It helps analyze connections between things - like social networks, computer networks, or any system where objects are connected to each other. And now NetworkX just got massively accelerated after its backend integration with NVIDIA's cuGraph. ✨ Up to 500x speedups on large graph workloads in NetworkX with zero code changes. And it is Zero Code Change Acceleration. 📌 cuGraph is NVIDIA's GPU-accelerated graph analytics library within the RAPIDS ecosystem. The library provides fast graph algorithms on GPUs, supporting property graphs, remote operations, and graph neural networks (GNNs). Works with GPU DataFrames (cuDF) and integrates smoothly with NetworkX-like API. -------- 📌 The traditional bottleneck of NetworkX's pure Python implementation becomes apparent when processing graphs larger than 100K nodes and 1M edges. 📌 And so now cuGraph solves this by offloading supported algorithms to the GPU. PageRank, Louvain community detection, betweenness centrality, and about 60 other algorithms get instant acceleration. 📌 This acceleration enables previously impractical use cases. Fraud detection systems can now process massive transaction networks in real-time. Recommendation engines handle millions of user-item interactions efficiently. Social network analysis scales to entire platforms worth of data on a single machine. @NVIDIAAIDev

rohanpaul_ai's tweet photo. NetworkX from NVIDIA is one THE most popular Python graph analytics library with ~15K Github starts and 80M downloads monthly.

This library is for working with networks and graphs. It helps analyze connections between things - like social networks, computer networks, or any system where objects are connected to each other.

And now NetworkX just got massively accelerated after its backend integration with NVIDIA's cuGraph.

✨ Up to 500x speedups on large graph workloads in NetworkX with zero code changes.

And it is Zero Code Change Acceleration.

📌 cuGraph is NVIDIA's GPU-accelerated graph analytics library within the RAPIDS ecosystem. The library provides fast graph algorithms on GPUs, supporting property graphs, remote operations, and graph neural networks (GNNs). Works with GPU DataFrames (cuDF) and integrates smoothly with NetworkX-like API.

--------

📌 The traditional bottleneck of NetworkX's pure Python implementation becomes apparent when processing graphs larger than 100K nodes and 1M edges.

📌 And so now cuGraph solves this by offloading supported algorithms to the GPU. PageRank, Louvain community detection, betweenness centrality, and about 60 other algorithms get instant acceleration.

📌 This acceleration enables previously impractical use cases. Fraud detection systems can now process massive transaction networks in real-time. Recommendation engines handle millions of user-item interactions efficiently. Social network analysis scales to entire platforms worth of data on a single machine.

@NVIDIAAIDev

10

909

152

709

63K

drewfustin retweeted

Heng Li @lh3lh3

almost 2 years ago

Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB https://t.co/DiRwZNHVVa

lh3lh3's tweet photo. Preprint on "BWT construction and search at the terabase scale". We can compress 100 human genomes to 11GB in 21 hours, find SMEMs with it, do affine-gap alignment and retrieve similar local haplotypes. 7.3Tb commonly sequenced bacterial genomes ⇒ 30GB https://t.co/DiRwZNHVVa https://t.co/DQSAjUBl8R

9

714

216

249

193K

drewfustin retweeted

elvis

@omarsar0

over 1 year ago

o3-mini-high (left) vs. deepseek-r1 (right) results from the first try deepseek-r1 is cracked... wtf!

102

2K

169

930

720K

drewfustin retweeted

Xiang Yue @xiangyue96

over 1 year ago

Introducing Critique Fine-Tuning (CFT): a more effective SFT method for enhancing LLMs' reasoning abilities. 📄 Paper: https://t.co/oK4vCIMP7z CFT is simple: instead of training models to directly answer questions, we train them to critique noisy answers. What's fascinating is that while most approaches focus on using generative critique or reward models to provide feedback for policy models, these critique models can themselves serve as policy models： directly answering questions with stronger reasoning. Interestingly, we also found that CFT saturates quickly: overtraining on critiques can even degrade problem-solving performance. Work led by @YuboWang726 and collaborated with @WenhuChen

xiangyue96's tweet photo. Introducing Critique Fine-Tuning (CFT): a more effective SFT method for enhancing LLMs' reasoning abilities.
📄 Paper: https://t.co/oK4vCIMP7z
CFT is simple: instead of training models to directly answer questions, we train them to critique noisy answers.

What's fascinating is that while most approaches focus on using generative critique or reward models to provide feedback for policy models, these critique models can themselves serve as policy models： directly answering questions with stronger reasoning.

Interestingly, we also found that CFT saturates quickly: overtraining on critiques can even degrade problem-solving performance.

Work led by @YuboWang726 and collaborated with @WenhuChen

11

306

67

228

23K

drewfustin retweeted

Unsloth AI

@UnslothAI

over 1 year ago

Run DeepSeek-R1 (671B) locally on @OpenWebUI - Full Guide No GPU required. Using our 1.58-bit Dynamic GGUF and llama.cpp. Tutorial: https://t.co/xaR9KpJzcj

UnslothAI's tweet photo. Run DeepSeek-R1 (671B) locally on @OpenWebUI - Full Guide

No GPU required.
Using our 1.58-bit Dynamic GGUF and llama.cpp.

Tutorial: https://t.co/xaR9KpJzcj

16

839

175

792

68K

drewfustin retweeted

Jürgen Schmidhuber

@SchmidhuberAI

over 1 year ago

DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system. REFERENCES (easy to find on the web): [1] #DeepSeekR1 (2025): Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2501.12948 [2] J. Schmidhuber (JS, 2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes the reinforcement learning (RL) prompt engineer which learns to actively and iteratively query its model for abstract reasoning and planning and decision making. [3] JS (2018). One Big Net For Everything. arXiv 1802.08864. See also US11853886B2. This paper collapses the reinforcement learner and the world model of [2] (e.g., a foundation model) into a single network, using the neural network distillation procedure of 1991 [4]. Essentially what's now called an RL "Chain of Thought" system, where subsequent improvements are continually distilled into a single net. See also [5]. [4] JS (1991). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. Based on TR FKI-148-91, TUM, 1991. First working deep learner based on a deep recurrent neural net hierarchy (with different self-organising time scales), overcoming the vanishing gradient problem through unsupervised pre-training (the P in CHatGPT) and predictive coding. Also: compressing or distilling a teacher net (the chunker) into a student net (the automatizer) that does not forget its old skills - such approaches are now widely used. See also [6]. [5] JS (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990, introducing high-dimensional reward signals and the GAN principle). Contains summaries of [2][3] above. [6] JS (AI Blog, 2021). 30-year anniversary: First very deep learning with unsupervised pre-training (1991) [4]. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled [4] into a single deep neural network. 1993: solving problems of depth >1000.

SchmidhuberAI's tweet photo. DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system.

REFERENCES (easy to find on the web):

[1] #DeepSeekR1 (2025): Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2501.12948

[2] J. Schmidhuber (JS, 2015). On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arXiv 1210.0118. Sec. 5.3 describes the reinforcement learning (RL) prompt engineer which learns to actively and iteratively query its model for abstract reasoning and planning and decision making.

[3] JS (2018). One Big Net For Everything. arXiv 1802.08864. See also US11853886B2. This paper collapses the reinforcement learner and the world model of [2] (e.g., a foundation model) into a single network, using the neural network distillation procedure of 1991 [4]. Essentially what's now called an RL "Chain of Thought" system, where subsequent improvements are continually distilled into a single net. See also [5].

[4] JS (1991). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992. Based on TR FKI-148-91, TUM, 1991. First working deep learner based on a deep recurrent neural net hierarchy (with different self-organising time scales), overcoming the vanishing gradient problem through unsupervised pre-training (the P in CHatGPT) and predictive coding. Also: compressing or distilling a teacher net (the chunker) into a student net (the automatizer) that does not forget its old skills - such approaches are now widely used. See also [6].

[5] JS (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990, introducing high-dimensional reward signals and the GAN principle). Contains summaries of [2][3] above.

[6] JS (AI Blog, 2021). 30-year anniversary: First very deep learning with unsupervised pre-training (1991) [4]. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled [4] into a single deep neural network. 1993: solving problems of depth >1000.

277

5K

883

4K

848K

drewfustin retweeted

ILIAS ISM

@illyism

over 1 year ago

You don't need a reasoning model like R1 or o3, just use this .cursorrules with Claude Sonnet to add a thinking step, works 100x better.

illyism's tweet photo. You don't need a reasoning model like R1 or o3, just use this .cursorrules with Claude Sonnet to add a thinking step, works 100x better. https://t.co/G68V3piHpx

80

5K

271

11K

558K

drewfustin retweeted

Ivan Fioravanti ᯅ

@ivanfioravanti

over 1 year ago

🔥 o3-mini-high beats deepseek r1 and o1-pro! in a p5.js challenge! 03-mini result is so good that deserves a video on its own. deepseek r1 (bad result) and o1-pro (better) in comments below. Prompt in last comment. 1/4

69

1K

126

650

463K

drewfustin retweeted

Flavio Adamo

@flavioAd

over 1 year ago

🚨 o3-mini crushed DeepSeek R1 🚨 "write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically"

672

18K

2K

5K

5M

drewfustin retweeted

Dimitris Papailiopoulos

@DimitrisPapail

over 1 year ago

Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work!

DimitrisPapail's tweet photo. Transformers can overcome easy-to-hard and length generalization challenges through recursive self-improvement.

Paper on arxiv coming on Monday.
Link to a talk I gave on this below 👇

Super excited about this work!

19

1K

138

902

167K

drewfustin retweeted

Sam Altman

@sama

over 1 year ago

o3-mini is out! smart, fast model. available in ChatGPT and API. it can search the web, and it shows its thinking. available to free-tier users! click the "reason" button. with ChatGPT plus, you can select "o3-mini-high", which thinks harder and gives better answers.

2K

26K

2K

3K

3M

drewfustin retweeted

Seunghyun Seo @SeunghyunSEO7

over 1 year ago

what up guys, I made a one-page comparison of MHA and MLA from @deepseek_ai for those who skipped the DS-V2 paper. pls correct me if I'm wrong.

SeunghyunSEO7's tweet photo. what up guys, I made a one-page comparison of MHA and MLA from @deepseek_ai for those who skipped the DS-V2 paper.
pls correct me if I'm wrong. https://t.co/MVoAcOrNzB

4

362

47

321

39K

drewfustin retweeted

Breeze

@BreezeChai

over 1 year ago

Ascending to the Divine

1K

403K

28K

51K

42M

drewfustin retweeted

LangChain

@LangChain

over 1 year ago

📚🤖 Advanced RAG + Agents Cookbook A comprehensive open-source guide delivering production-ready implementations of cutting-edge RAG techniques with AI agents. Built with LangChain and LangGraph, it features advanced implementations like Hybrid, Self, and ReAct RAG. Learn more: https://t.co/pXkXMFFSYt

LangChain's tweet photo. 📚🤖 Advanced RAG + Agents Cookbook

A comprehensive open-source guide delivering production-ready implementations of cutting-edge RAG techniques with AI agents. Built with LangChain and LangGraph, it features advanced implementations like Hybrid, Self, and ReAct RAG.

Learn more: https://t.co/pXkXMFFSYt

5

702

157

779

61K

drewfustin retweeted

Andi Marafioti

@andimarafioti

over 1 year ago

Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s🔥 Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights 🫡 Now you can train any of our SmolVLMs—or create your own custom VLMs!

andimarafioti's tweet photo. Fuck it, today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s🔥
Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights 🫡
Now you can train any of our SmolVLMs—or create your own custom VLMs!

34

1K

210

902

99K

drewfustin retweeted

AK

@_akhaliq

over 1 year ago

OpenAI o3-mini System Card

11

361

68

100

47K

drewfustin retweeted

Han Xiao

@hxiao

over 1 year ago

Letter-dropping physics comparison: o3-mini vs. deepseek-r1 vs. claude-3.5 in one-shot - which is the best? Prompt: Create a JavaScript animation of falling letters with realistic physics. The letters should: * Appear randomly at the top of the screen with varying sizes * Fall under Earth's gravity (9.8 m/s²) * Have collision detection based on their actual letter shapes * Interact with other letters, ground, and screen boundaries * Have density properties similar to water * Dynamically adapt to screen size changes * Display on a dark background

153

3K

251

2K

604K

drewfustin retweeted

elvis

@omarsar0

over 1 year ago

AI Agents for Computer Use This report provides a comprehensive overview of the emerging field of instruction-based computer control, examining available agents – their taxonomy, development, and resources.

omarsar0's tweet photo. AI Agents for Computer Use

This report provides a comprehensive overview of the emerging field of instruction-based computer control, examining available agents – their taxonomy, development, and resources. https://t.co/pNFyewjee6

15

657

141

748

66K

drewfustin retweeted

Gabriel Massadas

@G4brym

over 1 year ago

Gemini 2.0 doesn’t get nearly enough credit. I just dumped all my workers-qb source code into it, hit it with a simple, humble prompt, and boom => it one-shotted the docs. Not just good docs, way better than what I had before, packed with examples. Kinda insane.

30

712

59

485

115K

Matt Jones

@drewfustin

Last Seen Users on Sotwe

Trends for you

Most Popular Users