Sho Yaida @Shoyaida - Twitter Profile

ShoYaida retweeted

26 days ago

Why AI Can Now Make Discoveries - my conversation with @danintheory, Lead of the Foundations of Reinforcement Learning team at @OpenAI 00:00 Intro: AI's wild week in mathematics 01:21 What OpenAI's Foundations of RL team does 03:08 Dan's journey: from black holes and quantum gravity to frontier AI 07:04 Are AI systems becoming useful for real science 08:21 The AI math moment: Erdős, OpenAI, DeepMind, and Anthropic 08:52 Why the OpenAI result was an act of exploration 10:25 OpenAI vs. DeepMind: informal reasoning vs. formal proof 12:13 RL 101: learning by doing, not just watching 15:10 Why reinforcement learning works 15:58 How RL breaks: sparse feedback and long-horizon tasks 17:03 RLHF: how human feedback shaped early language models 18:48 Move 37, self-play, and the search for novel strategies 22:16 Explore vs. exploit in scientific discovery 24:49 Why RL may now be "the cake," not the cherry on top 25:46 Why RL started working with large language models 27:29 Is RL "sucking supervision through a straw"? 28:47 Why language may be the grounding layer for intelligence 31:46 A contrarian take on the Bitter Lesson 32:41 What test-time compute actually is 34:50 How RL gives models the ability to think 35:40 Verifiable rewards, math, coding, and the messy real world 38:00 What physics can teach us about AI 42:08 Is there a thermodynamics of AI? 43:08 From Erdős problems to Einstein-level AI 45:16 Is AI already doing original science? 45:51 How far are we from AI automating AI research 47:41 Why Dan is excited about the future of science

9

85

7

106

65K

ShoYaida retweeted

Ahmad Al-Dahle @Ahmad_Al_Dahle

about 1 year ago

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4 collection 🦙. Here are some highlights: 📌 The Llama series have been re-designed to use state of the art mixture-of-experts (MoE) architecture and natively trained with multimodality. We’re dropping Llama 4 Scout & Llama 4 Maverick, and previewing Llama 4 Behemoth. 📌 Llama 4 Scout is highest performing small model with 17B activated parameters with 16 experts. It’s crazy fast, natively multimodal, and very smart. It achieves an industry leading 10M+ token context window and can also run on a single GPU! 📌 Llama 4 Maverick is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding – at less than half the active parameters. It offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena. It can also run on a single host! 📌 Previewing Llama 4 Behemoth, our most powerful model yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight. A big thanks to all of our launch partners (full list in blog) for helping us bring Llama 4 to developers everywhere including @huggingface, @togethercompute, @SnowflakeDB, @ollama, @databricks and many others👏 This is just the start, we have more models coming and the team is really cooking – look out for Llama 4 Reasoning 😉 A few weeks ago, we celebrated Llama being downloaded over 1 billion times. Llama 4 demonstrates our long-term commitment to open source AI, the entire open source AI community, and our unwavering belief that open systems will produce the best small, mid-size and soon frontier models. Llama would be nothing without the global open source AI community & we are so ready to begin this next chapter with you. 🦙 Read more about the release here: https://t.co/7mbK3uggjO, and try it in our products today.

Ahmad_Al_Dahle's tweet photo. Introducing our first set of Llama 4 models!

We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4 collection 🦙. Here are some highlights:

📌 The Llama series have been re-designed to use state of the art mixture-of-experts (MoE) architecture and natively trained with multimodality. We’re dropping Llama 4 Scout & Llama 4 Maverick, and previewing Llama 4 Behemoth.

📌 Llama 4 Scout is highest performing small model with 17B activated parameters with 16 experts. It’s crazy fast, natively multimodal, and very smart. It achieves an industry leading 10M+ token context window and can also run on a single GPU!

📌 Llama 4 Maverick is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding – at less than half the active parameters. It offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena. It can also run on a single host!

📌 Previewing Llama 4 Behemoth, our most powerful model yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.

A big thanks to all of our launch partners (full list in blog) for helping us bring Llama 4 to developers everywhere including @huggingface, @togethercompute, @SnowflakeDB, @ollama, @databricks and many others👏 This is just the start, we have more models coming and the team is really cooking – look out for Llama 4 Reasoning 😉

A few weeks ago, we celebrated Llama being downloaded over 1 billion times. Llama 4 demonstrates our long-term commitment to open source AI, the entire open source AI community, and our unwavering belief that open systems will produce the best small, mid-size and soon frontier models. Llama would be nothing without the global open source AI community & we are so ready to begin this next chapter with you. 🦙

Read more about the release here: https://t.co/7mbK3uggjO, and try it in our products today.

314

6K

878

1K

1M

ShoYaida retweeted

Boris Hanin

@BorisHanin

about 3 years ago

I’m excited to read this!

0

32

7

9

9K

ShoYaida retweeted

Dan Roberts

@danintheory

about 3 years ago

Congrats to @em_dinan, @ShoYaida, and @suchenzang on their new work https://t.co/ZT61Hg7Awl, applying the effective theory blueprint of PDLT to the transformer architecture!

0

51

11

10

7K

Who to follow

Boris Hanin

@BorisHanin

Associate Professor at Princeton ORFE

Greg Yang

@TheGregYang

xai cofounder. fighting lyme

Jascha Sohl-Dickstein

@jaschasd

Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.

ShoYaida retweeted

Emily Dinan @em_dinan

about 3 years ago

Had a blast working with the amazing @ShoYaida and @suchenzang to bring theory and practice closer together for Transformers …

em_dinan's tweet photo. Had a blast working with the amazing @ShoYaida and @suchenzang to bring theory and practice closer together for Transformers … https://t.co/MdaIfHSuoK

3

132

24

44

21K

ShoYaida retweeted

Susan Zhang

@suchenzang

about 3 years ago

From theory to practice, here's our latest attempt to bring the two sides closer together: https://t.co/xN8zKiNX2L This was also one of the most delightful collabs I've had the chance to be a part of. Thanks @ShoYaida & @em_dinan for being such wonderful humans to work with! 😍

4

263

58

136

66K

ShoYaida retweeted

American Institute of Physics @AIP_HQ

over 3 years ago

What is deep learning theory? Join @PhysicsToday at 1 p.m. EDT on Thursday, March 30th for the free webinar "The Principles of Deep Learning Theory" presented by @DanInTheory and @ShoYaida. Register here: https://t.co/5nMPt4hU8H

0

3

0

1K

ShoYaida retweeted

IAIFI @iaifi_news

over 3 years ago

Registration is now open for our 2023 Summer Workshop! Featuring plenary talks, posters, and networking, the #Workshop will bring together researchers across #AI and #physics disciplines. August 14–18, 2023 at @Northeastern. https://t.co/Jb1stzm3gM @AIVOInfo

iaifi_news's tweet photo. Registration is now open for our 2023 Summer Workshop! Featuring plenary talks, posters, and networking, the #Workshop will bring together researchers across #AI and #physics disciplines. August 14–18, 2023 at @Northeastern. https://t.co/Jb1stzm3gM @AIVOInfo https://t.co/ROFuO7jEqe

0

27

10

3

11K

ShoYaida retweeted

Dan Roberts

@danintheory

over 3 years ago

If you like both physics and AI -- and in particular if you are interested in using physics tools to study AI -- please apply to the "Theoretical Physics for Deep Learning" workshop at the Aspen Center for Physics this summer (May 28 to June 18): https://t.co/t0tbr6I0Ui

danintheory's tweet photo. If you like both physics and AI -- and in particular if you are interested in using physics tools to study AI -- please apply to the "Theoretical Physics for Deep Learning" workshop at the Aspen Center for Physics this summer (May 28 to June 18):

https://t.co/t0tbr6I0Ui https://t.co/lOBoQFgqRM

5

139

37

43

21K

ShoYaida retweeted

Dan Roberts

@danintheory

over 3 years ago

New work on the origin of @OpenAI's neural scaling laws w/ Alex Maloney and @jamiesully2: we solve a simplified model of scaling laws to gain insight into how scaling behavior arises and to probe its behavior in regimes where scaling laws break down. https://t.co/uzAztBKJ4p 1/

2

138

17

41

0

ShoYaida retweeted

Dan Roberts

@danintheory

over 3 years ago

New exciting work from @ShoYaida! Using the effective theory framework of PDLT, he finds a 1-parameter family of interesting hyperparameter scaling strategies that interpolate between the NTK and maximal-update parameterizations. https://t.co/sKqMr63BBg

0

5

3

2

0

ShoYaida retweeted

Dan Roberts

@danintheory

over 3 years ago

Check out this really nice post on PDLT by Jennifer Lin from FHI on LessWrong. In the post, she includes a summary of key results and then focuses on potential implications for AGI forecasting and safety in terms of interpretability. https://t.co/ju2xKCpx75 1/2

1

11

4

3

0

ShoYaida retweeted

Jaehoon Lee @hoonkp

about 4 years ago

Physical book by @ShoYaida, @danintheory, @BorisHanin available now! Ordering my copy now..

0

21

3

1

0

ShoYaida retweeted

Dan Roberts

@danintheory

about 4 years ago

The Principles of Deep Learning Theory, by @ShoYaida, @BorisHanin, and myself, is available in print today from @CambridgeUP! You can find links to purchase (as well as a link to a free arxiv draft) here: https://t.co/SJMjqHPihz 1/4

danintheory's tweet photo. The Principles of Deep Learning Theory, by @ShoYaida, @BorisHanin, and myself, is available in print today from @CambridgeUP! You can find links to purchase (as well as a link to a free arxiv draft) here: https://t.co/SJMjqHPihz

1/4 https://t.co/vsTb36b09O

9

858

145

308

0

ShoYaida retweeted

Dan Roberts

@danintheory

about 4 years ago

Thanks to @ylecun, @witten271, Scott Aaronson, @wbialek, and Gilbert Strang for your wonderful endorsements! 2/4

2

16

3

1

0

ShoYaida retweeted

Dan Roberts

@danintheory

about 4 years ago

Also, thank you @rafaela31416 for making such an amazing drawing for our cover! 3/4

2

9

1

0

ShoYaida retweeted

Dan Roberts

@danintheory

about 4 years ago

For more (summary) info, check out this thread from when we posted our arxiv draft: https://t.co/VucddUzPvZ 4/4

1

8

1

0

ShoYaida retweeted

AI at Meta

@AIatMeta

about 4 years ago

Read about new developments in deep learning with authors and researchers Daniel A. Roberts (@danintheory), Sho Yaida (@Shoyaida) and Boris Hanin (@BorisHanin) in their book The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks. 👇

AIatMeta's tweet photo. Read about new developments in deep learning with authors and researchers Daniel A. Roberts (@danintheory), Sho Yaida (@Shoyaida) and Boris Hanin (@BorisHanin) in their book The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks. 👇 https://t.co/ohEo5GMHeg

19

1K

271

485

0

ShoYaida retweeted

Cambridge University Press - Science & Engineering

@CUP_SciEng

about 4 years ago

The Principles of Deep Learning Theory by @danintheory @ShoYaida @BorisHanin This volume develops an effective theory approach to understanding deep neural networks of practical relevance #StatPhys #NetworkSci https://t.co/6mYhBzdyUR

CUP_SciEng's tweet photo. The Principles of Deep Learning Theory by @danintheory @ShoYaida @BorisHanin

This volume develops an effective theory approach to understanding deep neural networks of practical relevance

#StatPhys #NetworkSci

https://t.co/6mYhBzdyUR https://t.co/VhlTdYFyps

0

8

3

0

ShoYaida retweeted

AI Coffee Break with Letitia @AICoffeeBreak

about 4 years ago

Special AI☕ break edition about exciting developments in mathematical deep learning – inspired by physics. 🔥 📺 https://t.co/GBdd1PMAOX Make sure to check it out if you want a quick summary over parts of the https://t.co/v2QTpwDxul book by @danintheory, @ShoYaida, @BorisHanin.

AICoffeeBreak's tweet photo. Special AI☕ break edition about exciting developments in mathematical deep learning – inspired by physics. 🔥
📺 https://t.co/GBdd1PMAOX

Make sure to check it out if you want a quick summary over parts of the https://t.co/v2QTpwDxul book by @danintheory, @ShoYaida, @BorisHanin. https://t.co/00OyVxmTdA

2

37

14

10

0

Sho Yaida

@ShoYaida

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users