Nolan Chai @nolanatlas - Twitter Profile

5 days ago

genuinely feels like im just doing fun research with friends; everyone is so talented, excited, hopeful, kind, and supportive of each other despite the challenges, it’s so fun to be able to be surrounded by people like this

0

19

Nolan Chai @nolanatlas

5 days ago

probably the most fun 2 weeks of work ive ever had anywhere :)

1

2

0

40

Nolan Chai @nolanatlas

29 days ago

will talk about this later this summer when im able, but im joining a neolab as an mts :)

0

3

0

72

nolanatlas retweeted

Muyu He

@HeMuyu0327

about 1 month ago

I am a big fan of Jianlin Su's blog because it always starts from first principles in mathematics, rather than "ML tricks", to approach a typical ML problem (eg. training-free MoE load balancing). Here is me trying to "reinvent" one such blog which provides an elegant alternative to compute Muon, by filling in all the derivations that the blog skips for a less math-savvy audience (besides being entirely in Mandarin). The goal of the blog is to find a way to compute a essential component of Muon, ie. the left and right singular value matrices U and V for the gradient G, **individually**. In the standard form, Muon really just needs their product UV^T, hence the standard way to compute it via computing a low-rank polynomial of G many times ("Newton-Schulz"). But there are more variants of Muon to control the properties of model updates if we can get both individually, hence the blog's proposal to revisit some fundamental linear algebra techniques for the computation. The methodological takeaway from the blog's thought process is that there are three components to breaking down a ML problem: (1) how to be able to compute something (power iteration), (2) how to compute it fast (cholesky decomposition), and (3) how to compute it accurately given finite floating points (repeated orthogonalization). The goal of reading inspiring blogs like this is, in Feynman's term, to be able to "reinvent" them at any time to grasp the fundamental approach of doing similar work. Original blog: https://t.co/5ksKPICpMW

HeMuyu0327's tweet photo. I am a big fan of Jianlin Su's blog because it always starts from first principles in mathematics, rather than "ML tricks", to approach a typical ML problem (eg. training-free MoE load balancing).

Here is me trying to "reinvent" one such blog which provides an elegant alternative to compute Muon, by filling in all the derivations that the blog skips for a less math-savvy audience (besides being entirely in Mandarin).

The goal of the blog is to find a way to compute a essential component of Muon, ie. the left and right singular value matrices U and V for the gradient G, **individually**. In the standard form, Muon really just needs their product UV^T, hence the standard way to compute it via computing a low-rank polynomial of G many times ("Newton-Schulz"). But there are more variants of Muon to control the properties of model updates if we can get both individually, hence the blog's proposal to revisit some fundamental linear algebra techniques for the computation.

The methodological takeaway from the blog's thought process is that there are three components to breaking down a ML problem: (1) how to be able to compute something (power iteration), (2) how to compute it fast (cholesky decomposition), and (3) how to compute it accurately given finite floating points (repeated orthogonalization). The goal of reading inspiring blogs like this is, in Feynman's term, to be able to "reinvent" them at any time to grasp the fundamental approach of doing similar work.

Original blog: https://t.co/5ksKPICpMW

10

2K

143

2K

77K

Who to follow

vik

@pikavik

MIT ‘22 CEO of asking dumb questions (she/her)

\begin{bio} MIT '24 & UBC '25 & MIT '30 \\ she/her \\ harmonic analysis \\ \mathbb{Q}ueer \\ optimistically an optimist? \end{bio}

Nolan Chai @nolanatlas

about 1 month ago

also, i will be back in sd for a bit in early July - maybe i’ll see some of you around for ACL 2026 :), presenting a bit of work on information theory in RLHF & SFT LM outputs

0

48

Nolan Chai @nolanatlas

about 1 month ago

will be heading out of sd soon to the bay… going to miss this area a lot but excited to work on some fun stuff :) will be in stealth for a few months, hoping to bring some updates then! gotta figure out how to bring these two goofballs with me

nolanatlas's tweet photo. will be heading out of sd soon to the bay… going to miss this area a lot but excited to work on some fun stuff :)
will be in stealth for a few months, hoping to bring some updates then!

gotta figure out how to bring these two goofballs with me https://t.co/eTQ2r1j3Ci

1

0

62

nolanatlas retweeted

Jeremy Howard

@jeremyphoward

about 1 month ago

@tokumin I feel that the trend towards training models to autonomously go off and try to do everything themselves is anti-human. We should, IMO, be training LLMs to support humans in their learning, creativity, and iterative experimentation.

17

225

35

29

17K

nolanatlas retweeted

Rachel Thomas

@math_rachel

4 months ago

5. Our current AI systems are doing a fuzzy interpolation between existing data points. This is valuable, but won’t give us something truly outside the scope of the training data. We still need research where new paradigms or different causal mechanisms are required.

2

28

5

1

2K

Nolan Chai @nolanatlas

about 2 months ago

one of the best parts about my suitemates being a bunch of researchers / phd students is just randomly bouncing ideas off each other in the hallway / living room

0

2

0

92

Nolan Chai @nolanatlas

2 months ago

anyone else going to ACL this summer?

0

2

0

108

Nolan Chai @nolanatlas

4 months ago

confirmed will be in sf in july/august :) will be working on research and infra stuff

1

4

0

147

nolanatlas retweeted

Davis Blalock

@davisblalock

4 months ago

🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀 https://t.co/nRrLSpjnwV A bunch of cool ideas make this possible: [1/n]

davisblalock's tweet photo. 🚀 Today we’re releasing FlashOptim: better implementations of Adam, SGD, etc, that compute the same updates but save tons of memory. You can use it right now via `pip install flashoptim`. 🚀

https://t.co/nRrLSpjnwV

A bunch of cool ideas make this possible: [1/n] https://t.co/xeaMyWztpv

31

2K

227

1K

220K

nolanatlas retweeted

Lucy Li @lucy3_li

4 months ago

Models are now expert math solvers, and so AI for math education is receiving increasing attention. Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. 🧵

lucy3_li's tweet photo. Models are now expert math solvers, and so AI for math education is receiving increasing attention.
Our new preprint evaluates 11 VLMs on our QA benchmark, DrawEduMath. We highlight a startling gap: models perform less well on inputs from K-12 students who need more help. 🧵 https://t.co/SR2dC5Jb9h

2

64

16

17

8K

nolanatlas retweeted

tom @ ICML 🇰🇷 @tvergarabrowne

4 months ago

first paper of the phd 🥳 the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it. however, this hypothesis has lacked a precise definition. we fix this.

tvergarabrowne's tweet photo. first paper of the phd 🥳

the Superficial Alignment Hypothesis (SAH) argues that pre-training adds most of the knowledge to a model, and post-training merely surfaces it.

however, this hypothesis has lacked a precise definition. we fix this. https://t.co/uOiduEXjbn

9

241

46

162

36K

nolanatlas retweeted

Bun @bunjavascript

4 months ago

am i a supply chain risk now???

175

9K

430

354

395K

Nolan Chai @nolanatlas

4 months ago

love seeing the progress on small and more efficient models rn ^

N8 Programs

@N8Programs

4 months ago

Beat it by having Codex hand-craft weights: https://t.co/g0T6rklaAY 100% accuracy on 10 million random test cases w/ only 343 parameters. As a bonus, it uses the vanilla Qwen3 architecture, just with the right weights.

63

2K

112

1K

753K

0

36

nolanatlas retweeted

Jett 🜲

@iky_fwjett

5 months ago

People with ADHD be like "I know a spot" and then start googling who the leader of Uruguay was in 1978

33

13K

985

715

424K

Nolan Chai @nolanatlas

5 months ago

@peakidiot so down, ill lyk! ^^

0

1

0

8

Nolan Chai @nolanatlas

5 months ago

might be heading to sf this year not sure yet though

1

2

0

121

Nolan Chai @nolanatlas

5 months ago

@chatterchip @OpenAI @the_IAS @VanderbiltU @Cambridge_Uni @Harvard he’s an assistant physics prof at vanderbilt affiliated with openai as a research scientist

1

8

0

2K

Nolan Chai

@nolanatlas

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users