rogo @roG00d - Twitter Profile

Pinned Tweet

rogo @roG00d

over 1 year ago

feelin' it rn

0

4

2

0

428

roG00d retweeted

Rhyush

@Resorcinolworks

12 days ago

@karpathy Now we wait for a Chinese model which has an equivalent amount of benchmarks at 99% reduced costs.

10

769

14

13

21K

roG00d retweeted

elie

@eliebakouch

18 days ago

microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale. this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab. the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale let's look at all of this in this likely very long thread 🧵

eliebakouch's tweet photo. microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.

this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.

the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale

let's look at all of this in this likely very long thread 🧵

42

2K

268

2K

286K

roG00d retweeted

Luke J. Huang

@whatthelukh

20 days ago

New blog! Is frontier asynchronous RL solved? The blog covers Async RL theory and infrastructure, surveying 8 open-weight frontier labs for the algorithmic techniques and systems fixes to handle train-inference mismatch. Also answered: why do current methods still fail at high policy lag? Which methods scale with horizon and compute?

whatthelukh's tweet photo. New blog! Is frontier asynchronous RL solved?

The blog covers Async RL theory and infrastructure, surveying 8 open-weight frontier labs for the algorithmic techniques and systems fixes to handle train-inference mismatch. Also answered: why do current methods still fail at high policy lag? Which methods scale with horizon and compute?

16

1K

134

2K

239K

Who to follow

Riku

@Ultra_Rikuu

I am a streamer/artist from London || Email: [email protected] || Catch me streaming on Twitch- https://t.co/h2CJTxQDLc

HushPuppy_420

@HushPuppy_420

hello everyone I’m hushpuppy come check out the YouTube channel and while ur there why not hit the subscribe button will help out the channel a lot

ProLeague

@proleaguecom

ProLeague - the competitive 11v11 FC26 Clubs league for EAFC.

roG00d retweeted

elie

@eliebakouch

25 days ago

wow, amazing tech report. lots of details on every part of the pipeline, especially on data. love that they share the system design of how they train models and do research with their "model factory", and also the negative results from M1 and how they fixed them in XS.2 one of the best tech reports to get up to speed on model training

eliebakouch's tweet photo. wow, amazing tech report. lots of details on every part of the pipeline, especially on data. love that they share the system design of how they train models and do research with their "model factory", and also the negative results from M1 and how they fixed them in XS.2

one of the best tech reports to get up to speed on model training

12

462

38

340

43K

roG00d retweeted

Poolside

@poolsideai

26 days ago

2/ Model Factory The central idea in the report is the Model Factory. It is the internal stack we use to make model development compound across runs: versioned data, reusable training, inference, and eval components, experiments as code, and lineage across runs, checkpoints, evals, and deployments. That is what lets us take lessons from M.1 and apply them to XS.2 quickly, from the start of training to release in about five weeks.

0

27

1

7

3K

roG00d retweeted

Robert Lauko

@robert_lauko

about 2 months ago

See the top ranked papers in AI, ML, Robotics, Quantum Physics, and more on @kurateorg. Hundreds of arXiv preprints ranked daily by scientific impact through pairwise tournaments judged by Claude, GPT, and Gemini.

robert_lauko's tweet photo. See the top ranked papers in AI, ML, Robotics, Quantum Physics, and more on @kurateorg. Hundreds of arXiv preprints ranked daily by scientific impact through pairwise tournaments judged by Claude, GPT, and Gemini.

663

27K

3K

13K

48M

roG00d retweeted

Sebastian Raschka

@rasbt

29 days ago

Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code: https://t.co/o2PMhjF0TN

rasbt's tweet photo. Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib.
With motivation, overview, and GPT-style model reference implementation as standalone example code: https://t.co/o2PMhjF0TN https://t.co/jjKyt3aPcR

44

2K

242

1K

75K

roG00d retweeted

zhyncs

@zhyncs42

27 days ago

Correctness is critical for LLM inference engines. Recently, I found TRT-LLM’s work on Hypothesis Testing Methodology to be extremely professional. https://t.co/Qr1CLCIQ06

zhyncs42's tweet photo. Correctness is critical for LLM inference engines. Recently, I found TRT-LLM’s work on Hypothesis Testing Methodology to be extremely professional.
https://t.co/Qr1CLCIQ06 https://t.co/fASycE1zl1

4

236

22

182

14K

roG00d retweeted

Jia-Bin Huang

@jbhuang0604

about 1 month ago

Modern Transformer - Complete Guide Interested in learning the recent advances in transformers? After 13 videos, I've finally completed this series! 🥳🥳🥳 Check out the course here: https://t.co/CsujxlWigC

jbhuang0604's tweet photo. Modern Transformer - Complete Guide

Interested in learning the recent advances in transformers?

After 13 videos, I've finally completed this series!
🥳🥳🥳

Check out the course here:
https://t.co/CsujxlWigC https://t.co/Q5m7RE7axm

11

1K

160

951

47K

roG00d retweeted

Nous Research

@NousResearch

about 1 month ago

Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

NousResearch's tweet photo. Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data.

During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining.

Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE.

The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.

150

4K

414

2K

449K

roG00d retweeted

OGAWA, Tadashi @ogawa_tter

about 1 month ago

Making inference more efficient, "Insights From NVIDIA Research", Bill Dally, GTC 2026, Mar 19 https://t.co/uFfnDFKUSk Stacked Memory https://t.co/mddrV4cqxs On-Chip NW https://t.co/alk1g3z1sA FG-DRAM https://t.co/eTWpdRennN Origins of GPU Comp, Apr 13 https://t.co/CjLE8m6HDS

ogawa_tter's tweet photo. Making inference more efficient,
"Insights From NVIDIA Research", Bill Dally, GTC 2026, Mar 19 https://t.co/uFfnDFKUSk

Stacked Memory https://t.co/mddrV4cqxs
On-Chip NW https://t.co/alk1g3z1sA
FG-DRAM https://t.co/eTWpdRennN

Origins of GPU Comp, Apr 13 https://t.co/CjLE8m6HDS https://t.co/ciKPUzvv7Z

1

190

27

229

52K

roG00d retweeted

Dirhousssi Amine

@DirhousssiAmine

about 1 month ago

"Python is a simple language" 🤡🤡 I hate this language. Spent 3 days debugging this hellish NCCL. Turns out a single Python regex was holding the GIL hostage while 15 H100s waited for rank 0 to show up. It was https://t.co/0CdQwq23Ag(). It's always https://t.co/0CdQwq23Ag().

DirhousssiAmine's tweet photo. "Python is a simple language" 🤡🤡

I hate this language. Spent 3 days debugging this hellish NCCL. Turns out a single Python regex was holding the GIL hostage while 15 H100s waited for rank 0 to show up.

It was https://t.co/0CdQwq23Ag(). It's always https://t.co/0CdQwq23Ag(). https://t.co/gV517ASTCP

19

429

14

175

63K

roG00d retweeted

LMSYS Org

@lmsysorg

about 2 months ago

Fastokens is officially merged into SGLang. This is an open-source Rust BPE tokenizer from @CrusoeAI, built with @nvidia Dynamo. → Up to 50% faster TTFT on agentic workloads (real production traffic) → 10x+ average speedup over HuggingFace tokenizers → Works across DeepSeek, Qwen, Kimi, MiniMax, Nemotron, and more Huge thanks to the @CrusoeAI team for the collab. Related SGLang PR: https://t.co/FDRLtSDwgb

2

98

11

36

11K

roG00d retweeted

Gabriele Berton

@gabriberton

about 2 months ago

Really cool When they trained GPT3 they had loss spikes because they scraped from a subreddit of microwave noises That training batch was literally text like "mmmmmmmmmmmmmmm"

gabriberton's tweet photo. Really cool

When they trained GPT3 they had loss spikes because they scraped from a subreddit of microwave noises

That training batch was literally text like "mmmmmmmmmmmmmmm" https://t.co/mi36Nh57L0

38

9K

350

1K

686K

roG00d retweeted

Perplexity

@perplexity_ai

about 2 months ago

We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings to trillion-parameter LLMs. With CuTeDSL integrated into our inference engine, Perplexity can build the specialized GPU kernels faster to bring models up to peak performance on NVIDIA Hopper and Blackwell GPUs.

perplexity_ai's tweet photo. We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings to trillion-parameter LLMs.

With CuTeDSL integrated into our inference engine, Perplexity can build the specialized GPU kernels faster to bring models up to peak performance on NVIDIA Hopper and Blackwell GPUs.

74

1K

118

352

161K

roG00d retweeted

Gauri Gupta

@gauri__gupta

about 2 months ago

https://t.co/cPJILNMC2l

10

662

81

2K

200K

roG00d retweeted

Google for Developers

@googledevs

about 2 months ago

Get more in the blog: https://t.co/JdUGFRejtX

5

243

23

155

36K

roG00d retweeted

SemiAnalysis

@SemiAnalysis_

about 2 months ago

For the past 12 years, cuDNN has been completely closed sourced (besides the .h files), until this week! OVER 20 MoE kernels & NSA sparse attention kernels from cuDNN has been open sourced! Great work to @manicely6005 & the rest of the team on seeing that parts of NVIDIA are moving towards open kernels! open source kernels drive innovation! (1/3) 🧵

SemiAnalysis_'s tweet photo. For the past 12 years, cuDNN has been completely closed sourced (besides the .h files), until this week! OVER 20 MoE kernels & NSA sparse attention kernels from cuDNN has been open sourced! Great work to @manicely6005 & the rest of the team on seeing that parts of NVIDIA are moving towards open kernels! open source kernels drive innovation! (1/3) 🧵

7

555

64

360

47K

roG00d retweeted

Turing Post

@TheTuringPost

about 2 months ago

There’s a serious gap in multimodal models – they work with images, but still reason in language, which isn’t that precise for visual stuff. @deepseek_ai just dropped an idea to solve this: let the model literally point to exact locations in the image while it thinks. They call it "Thinking with Visual Primitives." These visual primitives are: - points (specific locations) - bounding boxes (areas in the image) Using them, the model knows what exactly it’s referring to and achieves ~77% better accuracy on average (vs. Gemini 3 Flash's 76.5% and 71.1% for GPT-5.4) Plus, only ~80–90 visual tokens are kept in memory after compression thanks to the efficient architecture Here is how it works: