sunny madra @Sundeep - Twitter Profile

Pinned Tweet

26 days ago

A little over 5 months at @NVIDIA, and the GTC Taipei keynote captured just a glimpse of the incredible engineering happening here: AI factories: Vera Rubin, Vera CPU, Groq 3 LPX, BlueField, Spectrum-X, DSX Agentic AI: Agent Toolkit, OpenShell, Nemotron Personal AI: RTX Spark, DGX Spark, DGX Station Physical AI: Cosmos, GR00T, DRIVE Hyperion, Alpamayo https://t.co/pWUtEvarQL

4

45

3

5K

sunny madra

@sundeep

36 minutes ago

This is the major key...

David Senra

@davidsenra

about 3 hours ago

.@danawhite says one of the keys to longevity is to block out all negativity: “It never even crosses my mind that something's not going to work. I just keep going until it does work.” “There's this Bruce Lee quote where he says, ‘Never say negative things about yourself or what you're working on even if you're joking, because your body doesn't know the difference.’” “I never take in any negativity.”

7

631

62

358

37K

0

2

0

650

sunny madra

@sundeep

37 minutes ago

🎯🎯🎯

David Senra

@davidsenra

about 3 hours ago

.@danawhite says one of the keys to longevity is to block out all negativity: “It never even crosses my mind that something's not going to work. I just keep going until it does work.” “There's this Bruce Lee quote where he says, ‘Never say negative things about yourself or what you're working on even if you're joking, because your body doesn't know the difference.’” “I never take in any negativity.”

7

631

62

358

37K

0

1

0

1

575

sunny madra

@sundeep

41 minutes ago

♥️ DGX Spark

Tech2Wild

@Tech2Wild

about 11 hours ago

With 2 x DGX Sparks you can run Deepseek v4 Flash at 1M context at 40-46 Tok/s. That's the Tweet. https://t.co/sRQmryns1M

9

192

15

127

9K

0

1

0

568

Who to follow

Investor via @HaystackVC focused on seed-stage investments // Venture Partner w/ @LightspeedVP

Mary D'Onofrio

@mcadonofrio

👩🏻‍💻 Partner @CrosslinkCap. Former @BessemerVP. 📈 Investor @anthropicAI @goteleport. 📚 Author @WCA_LitAgency w/ @GSemach

sunny madra

@sundeep

about 12 hours ago

👀👀👀

Mikhail Parakhin

@MParakhin

about 14 hours ago

Not as relevant now :-(: I had an opportunity to deeply test both Fable 5 and GPT-5.6 Max. 5.6 is clearly better than Opus 4.8 at everything (slightly faster, too, though that depends on the load). Vis-a-vie Fable, it is clearly worse on coding, but better on agentic workloads. I had Fable write code, 5.6 run experiments - dreamy…

61

2K

34

347

195K

2

15

0

1

6K

sunny madra

@sundeep

about 14 hours ago

Vera Rubin NVL72

2

80

5

14

5K

sunny madra

@sundeep

about 21 hours ago

/1

a16z @a16z

1 day ago

Solopreneurs are making it big Charts of the Week: https://t.co/qmjdkZzisp

76

1K

152

628

109K

0

3

0

1

917

sunny madra

@sundeep

2 days ago

Truly empowering entrepreneurship, AI democratizes access to technology and will lead to the emergence of more sustainable small businesses.

davidlee

@davidlee

2 days ago

This is the narrative: AI is the most American technology - it allows anyone to start a business, provide their family a better life. --- New Business Formation is Surging https://t.co/sIBxqVsK8A

3

13

4

5

6K

5

17

1

5K

sunny madra

@sundeep

about 22 hours ago

X always with the bangers 🤣

3

24

1

2K

sunny madra

@sundeep

1 day ago

https://t.co/6TzHB4ujWb

Barathwaj Anandan

@BarathAnandan7

1 day ago

Good news. We cooked! @NVIDIAAI GLM 5.2 NVFP4 is out for anyone who's been waiting on a quality quant. Size ~465GB. Link below. Blackwell go brrr

41

1K

52

316

123K

0

28

2

8K

sunny madra

@sundeep

2 days ago

insightful paper: https://t.co/VfETuNMK83

Rohan Paul

@rohanpaul_ai

2 days ago

Great Stanford + MIT + Harvard + Anthropic paper. Gives a clear training-based reason for why larger models learn abilities smaller models miss. Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals. The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts. Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge. In a crowded data mixture, common patterns get first claim on the model’s internal machinery. Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again. They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters. The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less. Larger models can remember weak rare signals long enough to turn them into real learned skills. ---- Link – arxiv. org/abs/2605.29548 Title: "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"

rohanpaul_ai's tweet photo. Great Stanford + MIT + Harvard + Anthropic paper.

Gives a clear training-based reason for why larger models learn abilities smaller models miss.

Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals.

The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts.

Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge.

In a crowded data mixture, common patterns get first claim on the model’s internal machinery.

Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again.

They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters.

The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less.

Larger models can remember weak rare signals long enough to turn them into real learned skills.

----

Link – arxiv. org/abs/2605.29548

Title: "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"

14

558

106

437

54K

1

54

9

45

10K

sundeep retweeted

Rohan Paul

@rohanpaul_ai

2 days ago

Great Stanford + MIT + Harvard + Anthropic paper. Gives a clear training-based reason for why larger models learn abilities smaller models miss. Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals. The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts. Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge. In a crowded data mixture, common patterns get first claim on the model’s internal machinery. Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again. They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters. The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less. Larger models can remember weak rare signals long enough to turn them into real learned skills. ---- Link – arxiv. org/abs/2605.29548 Title: "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"

14

558

106

437

54K

sunny madra

@sundeep

2 days ago

Mark is right!

Squawk Box

@SquawkCNBC

2 days ago

"AI maximalist" @markpinc is optimistic about the future of the technology because he thinks the true killer apps haven't been invented yet. https://t.co/LdVTGHxx14

15

69

6

25

74K

5

22

1

3

13K

sunny madra

@sundeep

2 days ago

wow.

alphaXiv

@askalphaxiv

2 days ago

Here’s a fun comparison between GLM 5.2 and Opus 4.8 on a one-shot reproduction of the SDPO paper This is a hard task: the model must resolve messy verl issues and then run ablations to completion and confirm the paper’s claims. - GLM 5.2 costs $6.21 while Opus 4.8 cost us $46.35 - Both models spent a bulk of their tokens resolving initial verl issues. GLM 5.2 attempted 14 failed runs before first success while Opus 4.8 attempted 9 runs. - GLM 5.2 surprisingly took 2.65M tokens (excl re-reads) compared to 4.53M tokens for Opus 4.8

askalphaxiv's tweet photo. Here’s a fun comparison between GLM 5.2 and Opus 4.8 on a one-shot reproduction of the SDPO paper

This is a hard task: the model must resolve messy verl issues and then run ablations to completion and confirm the paper’s claims.

- GLM 5.2 costs $6.21 while Opus 4.8 cost us $46.35

- Both models spent a bulk of their tokens resolving initial verl issues. GLM 5.2 attempted 14 failed runs before first success while Opus 4.8 attempted 9 runs.

- GLM 5.2 surprisingly took 2.65M tokens (excl re-reads) compared to 4.53M tokens for Opus 4.8

44

1K

204

723

211K

1

4

0

3

2K

sunny madra

@sundeep

2 days ago

If you’re against datacenters, you should definitely use your phone in airplane mode 100% of the time to fully align yourself with the cause… 🤣

5

46

2

1

3K

sunny madra

@sundeep

2 days ago

👀👀👀

Lunens @Lunens__

2 days ago

this is like the Mount Rushmore for replyguys

151

8K

149

1K

787K

0

9

0

2

2K

sunny madra

@sundeep

2 days ago

Truly fitting!

Golden State Warriors

@warriors

2 days ago

Welcome to Dub Nation, @IREN_Ltd 👏 Golden State and IREN announced today a landmark multi-year global partnership that will include the IREN badge on all Golden State Warriors jerseys beginning with the 2026-27 season.

warriors's tweet photo. Welcome to Dub Nation, @IREN_Ltd 👏

Golden State and IREN announced today a landmark multi-year global partnership that will include the IREN badge on all Golden State Warriors jerseys beginning with the 2026-27 season. https://t.co/9dMeQihIaf

231

3K

220

104

1M

1

6

0

3K

sunny madra

@sundeep

2 days ago

Goose > Goat

Ed Zitron

@edzitron

3 days ago

SoftBank’s investor presentation is one of the greatest things ever made. I’ve been thinking about it all day. These are the real slides shown in a speech where Masayoshi Son said he wouldn’t retire for at least another decade. The goose stuff is perfect. https://t.co/sk9cDhdWIE

edzitron's tweet photo. SoftBank’s investor presentation is one of the greatest things ever made. I’ve been thinking about it all day. These are the real slides shown in a speech where Masayoshi Son said he wouldn’t retire for at least another decade. The goose stuff is perfect.

https://t.co/sk9cDhdWIE https://t.co/UbRX6kWvim

218

6K

671

3K

2M

0

14

0

7

4K

sunny madra

@sundeep

3 days ago

Truth

Altimeter Capital

@AltimeterCap

3 days ago

“Inference is going to be one of the largest, if not the largest markets, not in AI, in the world.” Altimeter's Apoorv Agarwal (@apoorv03) joined Bloomberg Tech @EdLudlow with @baseten CEO @tuhinone Srivastava to discuss the company's $1.5B financing and why he believes scalable inference infrastructure, open-source models, and enterprise control will be critical to the next phase of AI adoption. Watch the conversation below: https://t.co/pripKJcr8G

3

128

14

85

74K

0

42

1

13

7K

sunny madra

@sundeep

3 days ago

3

46

4

6

3K

sunny madra

@sundeep

4 days ago

The digital worker we’ve always dreamed of!

Andrej Karpathy

@karpathy

4 days ago

This is a new paradigm for interacting with Claude that is significantly more "inline" with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this "just work" (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way - you can talk to it as you would talk to a person and it can help with a very large variety of workloads. Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.

1K

22K

2K

13K

7M

6

55

2

19

15K

sunny madra

@sundeep

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users