Jeremy Cohen @deepcohen - Twitter Profile

Pinned Tweet

8 months ago

Part 1: How does gradient descent work? https://t.co/avsScLLuDF Part 2: A simple adaptive optimizer https://t.co/KehSb1Wu20 Part 3: How does RMSProp work? https://t.co/t2Cqe67f1M

1

122

11

117

17K

Jeremy Cohen @deepcohen

9 days ago

@jeankaddour This graphic is doing a lot of work

0

6

0

600

Jeremy Cohen @deepcohen

11 days ago

@nsaphra I think part of it was that he wanted his chores done for him. Bro would’ve loved AGI

2

17

1

0

2K

Jeremy Cohen @deepcohen

28 days ago

@roydanroy Could the average person get (somewhat diluted) equity in OpenAI/Anthropic by buying MSFT/Google stock? Genuine question - I’m not a personal finance expert

1

4

0

1K

Who to follow

Hadi Salman

@hadisalmanX

Research Scientist @OpenAI Previously: PhD @MIT @MSFTResearch @UberATG @SCSatCMU @AUB_Lebanon

Pavel Izmailov

@Pavel_Izmailov

Researcher @AnthropicAI 🤖 Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦

Tengyu Ma

@tengyuma

Assistant prof. @ Stanford; Chief AI Scientist @ MongoDB; Former Co-founder/CEO of Voyage AI Working on ML, DL, RL, LLMs, and their theory.

deepcohen retweeted

Stat.ML Papers @StatMLPapers

about 1 month ago

There Will Be a Scientific Theory of Deep Learning https://t.co/x2JAZvhAQI

4

317

49

269

28K

deepcohen retweeted

Jamie Simon @learning_mech

about 1 month ago

1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 https://t.co/92nSIHameW 🔧

learning_mech's tweet photo. 1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics!

We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics.

🔨 https://t.co/92nSIHameW 🔧 https://t.co/3cshMD33bl

53

2K

292

2K

303K

Jeremy Cohen @deepcohen

about 1 month ago

Looking forward to attending ICLR and giving a talk on Sunday at 9am at the Science of Deep Learning workshop: https://t.co/yTSelQxRzb. Message me if you want to chat about deep learning optimizer dynamics at the conference!

3

47

6

8

4K

Jeremy Cohen @deepcohen

about 1 month ago

@kalomaze found out i needed a visa from this tweet. applied last night, and went to brazil NYC consulate today (on advice of @marikgoldstein), even though the internet says they don't help with this. it worked - the person at the desk approved my visa. YMMV, but hope this helps someone!

0

3

0

396

deepcohen retweeted

Sunny Sanyal

@SunnySanyal9

3 months ago

I have spent 4 years making LLMs generalize better without more data or compute. I'm looking for a Research role in industry. Here's what I've built: 1/ Early Weight Averaging → First paper (2023) to apply weight averaging during LM pre-training. Now widely used in many pre-training pipelines. https://t.co/tjVfBIlBHg 2/ Attention Collapse → Diagnosed attention collapse in LLMs and proposed a training fix.https://t.co/eTcmMzMYfd 3/ Curriculum Finetuning → Upweight easy samples and downweight hard ones during finetuning to reduce forgetting. https://t.co/tLhqzUh7nY I am a PhD student at UT Austin. I have interned at DeepMind, LightningAI, and Amazon Alexa. If you're hiring or know someone who is, please DM or email ([email protected]). Web: https://t.co/S6UKjulcyW #MachineLearning #LLM #NLP #PhD #AIJobs #OpenToWork

3

94

10

83

22K

Jeremy Cohen @deepcohen

4 months ago

Isn’t it a little ironic that this argument for why LLMs aren’t truly intelligent is based on … pattern matching?

Big Brain AI

@realBigBrainAI

4 months ago

AMI Labs founder Yann LeCun on why LLMs are fooling us the same way AI has for decades: He argues that every generation of AI scientists has made the same mistake: confusing task performance with real intelligence. LeCun's core challenge to the current hype: "We're fooled into thinking those machines are intelligent because they can manipulate language. And we're used to the fact that people who can manipulate language very well are implicitly smart." He's clear that LLMs are useful, but being a useful tool and being intelligent are two very different things. The real insight is the historical pattern he's lived through. Since the 1950s, wave after wave of AI researchers have claimed their breakthrough was the path to human-level intelligence. Marvin Minsky. Herbert Simon. Frank Rosenblatt — who invented the perceptron, the first learning machine, in the 1950s — all predicted machines as smart as humans within a decade. "They were all wrong." LeCun has personally witnessed three of these cycles of hype and disappointment. And his verdict on the current one is blunt: "This generation with LLMs is also wrong. It's just another example of being fooled." The pattern: A new technique emerges → machines get good at specific tasks → we assume general intelligence The question worth asking: are we impressed by these tools because they're intelligent, or because they sound like they are?

143

1K

259

573

126K

1

14

0

2

2K

deepcohen retweeted

Samip

@industriaalist

4 months ago

Introducing Q Labs, a research lab focused on solving generalization. Alongside others (SSI, Flapping Airplanes), we see data efficiency as the key problem, but we're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.

35

676

54

467

162K

deepcohen retweeted

Andy @prompt_Tunes

5 months ago

https://t.co/uAFUIh3ayi

3

204

23

241

18K

deepcohen retweeted

Nikhil Ghosh @nikhilghosh101

5 months ago

Sharing our recent work on understanding the mechanisms underlying the empirical success of hyperparameter transfer using μP! (1/11) with Denny Wu and @albertobietti

nikhilghosh101's tweet photo. Sharing our recent work on understanding the mechanisms underlying the empirical success of hyperparameter transfer using μP! (1/11)

with Denny Wu and @albertobietti https://t.co/KTHwIBwTEr

2

146

33

116

19K

deepcohen retweeted

Marc Finzi

@m_finzi

5 months ago

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence https://t.co/M8ETQk9gHz with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

m_finzi's tweet photo. 1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence
https://t.co/M8ETQk9gHz
with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils https://t.co/CkfWLWtCFU

56

2K

392

2K

1M

Jeremy Cohen @deepcohen

6 months ago

@cjmaddison @tylerfarghly I agree that theory will probably never give us a closed form expression for the test error of resnet-50 on ImageNet, or eliminate all hyperpameters from deep learning, if that’s what is meant by “the big things”

1

2

0

790

Jeremy Cohen @deepcohen

6 months ago

The goal of deep learning theory/science is to guide practice. But most practical questions are >1 paper away from being legitimately answered by theory. How, then, can we make progress, without access to the ideal reward signal of “does this theory give us a SOTA algorithm?” …

6

181

25

78

27K

Jeremy Cohen @deepcohen

6 months ago

@cjmaddison @tylerfarghly IMO, theory could give us a *language for reasoning* about deep learning. Even with good theory, you’d probably still have to run some experiments, but much fewer than we do now, since you’d learn much more from each one.

1

10

0

817

Jeremy Cohen @deepcohen

6 months ago

@mj_theory We aspired to meet this criterion in our the research that we wrote up here: https://t.co/hlTxjchEEf.

0

1

0

51

Jeremy Cohen @deepcohen

6 months ago

So, we should focus on theories that can reliably predict “the small things” about deep learning, and gradually broaden the scope of what we can predict, until we have theory that can reliably predict “the big things” about deep learning too.

2

48

5

3

3K

Jeremy Cohen @deepcohen

6 months ago

A lot of DL theory work gets rightfully criticized for being “postdictive” — always giving an elegant retroactive explanation for SOTA, while somehow never anticipating it. But the real issue isn’t that such theories can’t predict SOTA, it’s that they can’t predict anything.

1

52

2

3

3K

Jeremy Cohen

@deepcohen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users