Hiranmay Darshane @hdarshane - Twitter Profile

Pinned Tweet

3 months ago

In Sept 2024, o1 surprised many purists who thought inference-time scaling for LLMs was through MCTS. What if a connection exists, just implicit? What does it imply? New post: "Squint enough and RLing CoT reasoners is approximable as Monte Carlo Tree Search policy learning." 🧵

hdarshane's tweet photo. In Sept 2024, o1 surprised many purists who thought inference-time scaling for LLMs was through MCTS.
What if a connection exists, just implicit? What does it imply?
New post: "Squint enough and RLing CoT reasoners is approximable as Monte Carlo Tree Search policy learning." 🧵 https://t.co/eath99O3xD

1

17

1

11

5K

Hiranmay Darshane @hdarshane

about 7 hours ago

truth nuke

Unnat Jain @unnatjain2010

1 day ago

Alyosha has a humbling lesson that hits us hard 🫣 @ Four seasons ballroom 4, CVPR 2026

0

65

6

17

40K

0

92

hdarshane retweeted

Samip

@industriaalist

about 8 hours ago

7/ Looking beyond this paper: scaling compute against a fixed, limited pool of data will need new primitives. Searching over a population of models is a different problem than standard gradient descent training and we've barely scratched the surface. We hope q0 pushes people toward crazy ideas in multi-epoch training and scaling compute in general!!

1

17

4

2

496

Hiranmay Darshane @hdarshane

about 8 hours ago

yay

Samip

@industriaalist

about 8 hours ago

1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs? Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget. w/ @bishmdl76 @akshayvegesna @ShmuelBerman

industriaalist's tweet photo. 1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs?

Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget.

w/ @bishmdl76 @akshayvegesna @ShmuelBerman

11

163

37

143

15K

0

6

0

533

Who to follow

Emili

@emili_iv

marketing designer @framer ✧ professional go-getter

Dongle

@a_donglee

Life is under a big unsafe block. Microsoft MVP @ Windows Development

jảçk

@hunktwink123

⠗ archive at https://t.co/4SVoychOv8 ⠗ for DMs, please head to https://t.co/HCn5EZOgpP

hdarshane retweeted

stochasm

@stochasticchasm

2 days ago

@soldni regularization is BACK i suppose. dropout 0.15 is quite large and i don't think anyone else uses dropout in the big 26. also rather high std for init these days but you can't go wrong with a good old 0.02. also why depth scale output proj when you have sandwich norm??

stochasticchasm's tweet photo. @soldni regularization is BACK i suppose. dropout 0.15 is quite large and i don't think anyone else uses dropout in the big 26. also rather high std for init these days but you can't go wrong with a good old 0.02. also why depth scale output proj when you have sandwich norm?? https://t.co/erAz4jpcYQ

4

65

3

14

13K

Hiranmay Darshane @hdarshane

2 days ago

@teortaxesTex @jacobrintamaki

0

1

0

195

Hiranmay Darshane @hdarshane

2 days ago

🔥

Jasper Gilley

@0xjasper

3 days ago

This paper empirically ~verifies the section of my first Zipfian grokking blog post where I hypothesize about how capacity competition dynamics extrapolate from the grokking to language pretraining case Cool work from the authors! :)

0xjasper's tweet photo. This paper empirically ~verifies the section of my first Zipfian grokking blog post where I hypothesize about how capacity competition dynamics extrapolate from the grokking to language pretraining case

Cool work from the authors! :) https://t.co/EyytNGaPfi

1

13

1

16

3K

0

2

0

120

hdarshane retweeted

kalomaze

@kalomaze

3 days ago

q: "why don't Sora-like models learn compositional physics understanding or do ICL like how language models learn compositional semantics?" a: every attempt to date heavily leaks information from the future. some even bake it into the bottleneck design without realizing (!!!)

kalomaze's tweet photo. q: "why don't Sora-like models learn compositional physics understanding or do ICL like how language models learn compositional semantics?"
a: every attempt to date heavily leaks information from the future. some even bake it into the bottleneck design without realizing (!!!) https://t.co/yePSrNsNfy

5

99

8

50

5K

Hiranmay Darshane @hdarshane

3 days ago

^I mean why is it not

0

127

Hiranmay Darshane @hdarshane

3 days ago

is this not regulated by SEC?

Hedgeye

@Hedgeye

6 days ago

Rule changes for the SpaceX $SPCX IPO: Index providers waived the profitability requirement and cut the seasoning window from 90 days to 5. This forces over $30 trillion in passive 401k and retirement money to buy SpaceX at IPO valuations. Bloomberg Intelligence estimates S&P 500 funds must absorb 19% of SpaceX's float within 6 months. Russell 1000 and Nasdaq 100 funds will absorb 24%. The rules built to protect passive investors: 1. S&P 500 has required 12 months of trading and 4 quarters of GAAP profitability since 2002. Both waived. 2. Nasdaq cut its inclusion window from 90 trading days to 15. 3. FTSE Russell cut its to 5. All three benchmarks are now structured to buy SpaceX at IPO pricing.

549

10K

2K

4K

12M

1

0

253

hdarshane retweeted

Christopher Potts

@ChrisGPotts

3 days ago

The following animation convey the intuition: when a 1-neuron model tries to learn two tasks, the frequent task updates suppress the infrequent task updates. The 2-neuron model can dedicate a neuron to the infrequent task once the frequent one is fully learned.

2

77

5

19

4K

Hiranmay Darshane @hdarshane

3 days ago

a quick way to force oneself into thinking about a thing is maintaining a list of words about that thing and just staring at it something something required circuits activate from high cosine similarity

1

6

1

0

249

hdarshane retweeted

Max Weinbach

@mweinbach

4 days ago

I was thinking about it again recently, Google Allo was really ahead on the idea of chatting with Google Assistant or @'ing in conversations to build out this Agent/AI UX we have now

mweinbach's tweet photo. I was thinking about it again recently, Google Allo was really ahead on the idea of chatting with Google Assistant or @'ing in conversations to build out this Agent/AI UX we have now https://t.co/0Tbd7JEMfH

26

502

19

55

32K

Hiranmay Darshane @hdarshane

5 days ago

most things arrive unrecognizable to the ideas that summoned them

0

2

0

70

hdarshane retweeted

Rohan Pandey

@khoomeik

6 days ago

my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them her name is backprop and her steering vectors are just gradients

8

278

8

54

24K

hdarshane retweeted

Geoffrey Litt

@geoffreylitt

7 days ago

@mschoening and I are starting a podcast where we nerd out about human-AI collaboration and malleable software. In this episode: is HTML actually better than Markdown? and an alternative to Software Factories... Watch on YT: https://t.co/O2DwUTWm4o

15

169

17

89

33K

Hiranmay Darshane @hdarshane

6 days ago

@leothecurious Olah

0

5

0

307

Hiranmay Darshane @hdarshane

6 days ago

does seem like all that time with colah did not alter his worldview at all

Pope Leo XIV

@Pontifex

6 days ago

Artificial intelligences do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships, and do not know from within what love, work, friendship or responsibility mean. Nor do they have a moral conscience, since they do not judge good and evil, grasp the ultimate meaning of situations, or bear responsibility for consequences. They may imitate or even simulate, but they do not understand what they produce, for they lack the affective, relational, and spiritual perspective through which human beings grow in wisdom. #MagnificaHumanitas

4K

307K

60K

25K

14M

0

3

0

117

hdarshane retweeted

Noah Smith 🐇🇺🇸🇺🇦🇹🇼

@Noahpinion

7 days ago

Yes, AIs are going to do all or almost all of the pure theory, but tbh humans probably finished most of the pure theory that it's possible for humans to do by the end of the 20th century. Yes there has been some recent theory progress but let's be honest, most is of marginal economic value at best. There's probably lots of useful pure theory left to do in this universe, but it's probably not the kind of stuff that can be intuited by a single human, explained to a grad student, and written down in a textbook. AI will do all that stuff.

35

215

15

58

73K

hdarshane retweeted

jss @jsensarma

7 days ago

1. people undestimate how hard this problem is 2. universal issue. IGCSE billed ~Rs 40k for exams - still many papers leaked 3. change is much harder than running things as is. migration to OSM requires competence++++ 4. with privatization, public sector => competence----

7

23

3

5

5K

hdarshane retweeted

Akshay

@akshayvegesna

7 days ago

Cool presenting on why generalization in neural nets is less of a mystery than many make it out to be:

1

28

5

15

14K

Hiranmay Darshane

@hdarshane

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users