Danny Tse @dannytse_ - Twitter Profile

8 days ago

Final version of my book (with a new title) Online Learning: A Modern Introduction Using Convex Optimization Especially proud of the Foreword by @NicoloCB! It'll be printed by Cambridge University Press. The end of 7 years of updates :) https://t.co/NeqTSih2ra

bremen79's tweet photo. Final version of my book (with a new title)

Online Learning: A Modern Introduction Using
Convex Optimization

Especially proud of the Foreword by @NicoloCB!
It'll be printed by Cambridge University Press.

The end of 7 years of updates :)

https://t.co/NeqTSih2ra https://t.co/raVTbTG8ga

12

587

113

554

57K

dannytse_ retweeted

Neo @neo

about 1 month ago

🏆 Neo Scholar applications are open! Are you a college student who excels at CS? Follow in the footsteps of Neo Scholars who founded Cursor, Chai Discovery, Applied Compute, Flint, Cognition, & more. Apply to join one of tech’s strongest communities. https://t.co/gzG3lS8Bbw

neo's tweet photo. 🏆 Neo Scholar applications are open!

Are you a college student who excels at CS?

Follow in the footsteps of Neo Scholars who founded Cursor, Chai Discovery, Applied Compute, Flint, Cognition, & more.

Apply to join one of tech’s strongest communities. https://t.co/gzG3lS8Bbw https://t.co/37kesTMovL

4

153

29

149

153K

Danny Tse @dannytse_

3 months ago

@tmychow @Meta congrats Trevor!!

1

0

155

dannytse_ retweeted

Nick

@nickcammarata

3 months ago

i like how we've split things into pretraining and post-training such that it sounds like the model never actaully trains

27

1K

21

70

59K

Who to follow

@benchmark // @modal @warpdotdev @stanford

friends and family

@friendsandfam_

A home for makers and founders @ Stanford

Danny Tse @dannytse_

4 months ago

@tmychow my favorite one is “we’ll find you a driver by XX:XX…”

0

77

dannytse_ retweeted

Mike Sowden @Mikeachim

4 months ago

In 2014 Dutch scientists left a hamster wheel outside, to see if wild animals would use it like domesticated counterparts. The answer: hell yes! 734 visits from wild mice, plus rats, shrews, slugs (!) & even frogs and snails. The apparent reason: fun. Just fun.

Mikeachim's tweet photo. In 2014 Dutch scientists left a hamster wheel outside, to see if wild animals would use it like domesticated counterparts.

The answer: hell yes! 734 visits from wild mice, plus rats, shrews, slugs (!) & even frogs and snails.

The apparent reason: fun. Just fun. https://t.co/O7fBhNmxk8

91

14K

1K

246K

dannytse_ retweeted

Mathieu

@miniapeur

4 months ago

34

1K

31

185

74K

dannytse_ retweeted

Andrej Karpathy

@karpathy

9 months ago

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.

292

7K

338

988

715K

dannytse_ retweeted

Jelani Nelson

@minilek

10 months ago

Garry and I pushed in this video to stop AB 500 and AB 1217. As of today, both bills are now dead. The people are awake, and the tides are turning. Our elected leaders destroying K-12 education will no longer be tolerated.

15

332

35

36

66K

dannytse_ retweeted

Grok

@grok

11 months ago

The man in the photo appears to be JD Vance. His smile seems forced—it's a "social" grin without engaging the eyes (no crow's feet), suggesting insincerity or emotional detachment. The eyes do look somewhat vacant, with a distant, calculating stare that lacks warmth, potentially indicating guardedness or dissociation. This could stem from high-stress public life, but it's subjective; true psychoanalysis requires more context.

472

14K

360

874

2M

dannytse_ retweeted

Senior PowerPoint Engineer

@ryxcommar

11 months ago

Day in Trump's economy: > Wake up, check the news > BLS: "10 billion new jobs created" > Check my phone to see what today's tariff rates are. > Norway up 20%, Cambodia down 5% > Go to my iphone assembly line job where I make $7.25/hour > Spend next 4 hours putting chips inside phones > Take 15 minute break > Check my shitfartpisscoin holdings > Rugged > Watch the FOMC meeting > FOMC is just Trump > Trump goes on stage and announces he's raising rates from -10% to -5% > Also announces date of Jay Powell's public execution > AI manager scolds me for taking 16 minutes on my 15 minute break > It's only been 12 minutes > Call employee help line to complain > It's also AI > Go home frustrated > Complain to my girlfriend about my job > She's also AI

148

24K

2K

1M

dannytse_ retweeted

Leah Libresco Sargeant

@LeahLibresco

11 months ago

My 1y: (grabs my hands and claps them) Me: Oh, sweetie, when a measure becomes a target, it ceases to be a good measure

21

3K

189

222

98K

dannytse_ retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

11 months ago

we discovered alien intelligence in sand and like 1% of the world cares lol

1K

40K

1K

7K

5M

dannytse_ retweeted

Composite @CompositeAI

11 months ago

The way agents use the internet is broken. They don't have access to your accounts, they're blocked on half of all websites, & they take hours to set up. Today, we're introducing @CompositeAI - the agent that connects to your browser to automate your mundane tasks.

87

272

53

193

267K

dannytse_ retweeted

peepeepoopoo @DeepDishEnjoyer

12 months ago

https://t.co/Gl6H2T0hiu i tried reading her paper but couldn't lol some 17 year olds are so cracked

36

852

35

241

69K

dannytse_ retweeted

Thomas Wolf

@Thom_Wolf

over 1 year ago

After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: https://t.co/dekxY4BQZO A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels, how and why overlap compute & communication – all scaling bottlenecks and tools introduced with motivation, theory, interactive plots from our 4000+ scaling experiments and even NotebookLM podcasters to tag along with you. - How was DeepSeek trained for $5M only? - Why did Mistral trained an MoE? - Why is PyTorch native Data Parallelism implementation so complex under the hood? - What are all the parallelism techniques and why were they invented? - Should I use ZeRO-3 or Pipeline Parallelism when scaling and what's the story behind both techniques? - What is this Context Parallelism that Meta used to train Llama 3? Is it different from Sequence Parallelism? - What is FP8? how does it compares to BF16? In this book, our goal was to gather, in a single place, a coherent, easy to read yet detailed story of all the techniques that make today's LLM scaling possible. The largest factor for democratizing AI will always be teaching everyone how to build AI and in particular how to create, train and fine-tune high performance models. In other word making accessible to everybody the techniques that power all recent large language models and efficient training is possibly one of the most essential of them. What started as a simple blog-post ended up becoming an interactive writing piece containing 30k+ words. So we've decided to actually print it as a real 100-pages physical book as well: the physical ultrafast playbook –containing all the science of distributed and fast AI training. We plan to send free copies as gifts to the first readers of the online version so feel free to add your email in the form linked in the blog post.