David Stephens @NumberByColors - Twitter Profile

30 days ago

There are only two honest metrics when it comes to benchmarking intelligence: novelty and efficiency. You don't need intelligence to solve a known problem (only memory). And you don't need intelligence to solve a problem via brute force. But to solve a novel problem efficiently, intelligence is the only way.

98

1K

128

326

64K

NumberByColors retweeted

Ethan Mollick

@emollick

about 2 months ago

One thing thing about AI, for better and worse, is that "everything around me is somebody's life work" is no longer a true assumption going forward.

emollick's tweet photo. One thing thing about AI, for better and worse, is that "everything around me is somebody's life work" is no longer a true assumption going forward. https://t.co/Ul02G271yq

64

5K

319

639

101K

NumberByColors retweeted

Colin @colinarmis

8 months ago

"do you like using twitter?" does sisyphus like his boulder

287

246K

36K

8K

4M

NumberByColors retweeted

Gergely Orosz

@GergelyOrosz

8 months ago

Product launches usually paint an ambitious vision of what they want to achieve. The launch of Vibes paints the vision of people (and kids!) glued to their phones, scrolling thru AI slop (infused with ads eventually, obviously) What a terrible future. I hope it never happens

39

1K

90

73

62K

Who to follow

Adam Granicz

@granicz

CEO, IntelliFactory - Your Functional Experts, Caltech CS

ad3mar

@ad3mar

programmer, mathematician, ultra runner wannabe, chess player.

Eirik Tsarpalis

@eiriktsarpalis

SDE, .NET Libraries at Microsoft.

NumberByColors retweeted

Derek Thompson

@DKThomp

11 months ago

Yes. Writing is not a second thing that happens after thinking. The act of writing is an act of thinking. Writing *is* thinking. Students, academics, and anyone else who outsources their writing to LLMs will find their screens full of words and their minds emptied of thought.

DKThomp's tweet photo. Yes.

Writing is not a second thing that happens after thinking. The act of writing is an act of thinking. Writing *is* thinking.

Students, academics, and anyone else who outsources their writing to LLMs will find their screens full of words and their minds emptied of thought. https://t.co/Iv2H4mTAk3

526

30K

7K

12K

3M

NumberByColors retweeted

Earth Is A Sales Funnel For SATAN

@GENIC0N

11 months ago

it's been 40,000 years since the singularity happened for dogs and they seem fine

65

3K

183

278

289K

NumberByColors retweeted

Tom Warren

@tomwarren

12 months ago

what's old is new again

95

4K

321

197

153K

NumberByColors retweeted

AI Digest

@aidigest_

about 1 year ago

We just added @OpenAI's powerful new o3 and o4-mini agents to this graph. The results are striking. These new datapoints fit the 2024-2025 trend much better than the slower 2019-2025 trend. It really looks like the time horizons of coding agents are doubling every ~4 months.

aidigest_'s tweet photo. We just added @OpenAI's powerful new o3 and o4-mini agents to this graph. The results are striking.

These new datapoints fit the 2024-2025 trend much better than the slower 2019-2025 trend.

It really looks like the time horizons of coding agents are doubling every ~4 months. https://t.co/ziAEP2oPdN

55

1K

206

583

335K

NumberByColors retweeted

Ethan Mollick

@emollick

over 1 year ago

I wish more AI lab leaders would spell out a vision for the world, one that is clear about what they think life will actually be like for humans living in a world of AGI Faster science & productivity, good - but what is the experience of a day in the life in the world they want?

53

237

25

17K

David Stephens @NumberByColors

over 1 year ago

Downloaded this app last night. And was not prepared for how jaw-dropping it is. You genuinely feel like you're flying. Anywhere in the world. And if you go low enough, you can view immersive Street View. AR/VR apps done well are like downloading superpowers.

Justin Ryan ᯅ

@justinryanio

over 1 year ago

I just downloaded FLY – Explore the Earth on Vision Pro, and I can’t stop using it! You’re in a little aircraft, soaring over any location. Simply lean in the direction you want to go. It’s immersive, even with Google Maps graphics. Highly recommend!

40

552

43

298

47K

6

11

1

2

758

NumberByColors retweeted

Michael Thomas

@curious_founder

over 1 year ago

Vaccines are good actually.

1K

20K

3K

2K

1M

NumberByColors retweeted

Gabriel

@gabelbling

over 1 year ago

Amazing graphical representation of a neural net, never seen anything like it.

47

20K

3K

10K

2M

NumberByColors retweeted

Ben Gilbert

@gilbert

over 1 year ago

It's totally insane that humans have figured out physics and chemistry to the point where you can do a bunch of math, then launch a giant hunk of metal and fuel to space, then have it come back and land (...get caught!) exactly where your math said it would. That, plus the hardware / software / real-time algorithm systems to perfectly nestle up to the tower, autonomously. Just an astonishing level of precision. Incredible work, @SpaceX.

9

769

52

59

57K

NumberByColors retweeted

wave (returning to the bay arc)

@0xWave

over 1 year ago

The 3D artists at the weather channel deserve a raise for this insane visual Now watch this, and then realize forecasts are now predicting up to 15 ft of storm surge in certain areas on the western coast of Florida

483

23K

5K

4K

3M

David Stephens @NumberByColors

over 1 year ago

It’s that meme where the chicken is recognizing its former self in a picture of a dinosaur

Smoke-away @SmokeAwayyy

over 1 year ago

Multiplication using o1-mini vs GPT-4o

78

2K

160

450

260K

0

1

0

178

David Stephens @NumberByColors

over 1 year ago

Create Loop pages, right from chat with Copilot! 💜

Satya Nadella

@satyanadella

over 1 year ago

With Web + Work + Pages, you can now ideate with AI and collaborate with other people. It’s just magical. You can learn more here: https://t.co/N0iE3Rv21I

50

997

200

255

163K

0

1

0

154

NumberByColors retweeted

Jim Fan

@DrJimFan

over 1 year ago

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter. 1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased. 2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS). 3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month: - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search. 4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much. 5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards. This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

DrJimFan's tweet photo. OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter.

1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased.

2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS).

3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month:

- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5.
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search.

4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much.

5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards.

This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

135

6K

1K

3K

800K

NumberByColors retweeted

Bret Victor

@worrydream

almost 2 years ago

★ Dynamicland's new website documents ten years of progress toward a humane dynamic medium. https://t.co/uAg40FbcFM

23

824

198

281

189K

David Stephens @NumberByColors

almost 2 years ago

This app is *amazing.* Just being able to teleport to any location with Street View is enough, but having all your photos layered on top is so emotional. Exciting to see more mixed reality apps getting built!

Justin Ryan ᯅ

@justinryanio

almost 2 years ago

this is officially one of my favorite new apps: Sceno on Apple Vision Pro! i couldn’t stop smiling while reliving memories exactly where they happened. you can browse past photos in immersive panoramas and explore new locations. i highly recommend it!

35

2K

164

755

191K

0

5

0

203

NumberByColors retweeted

Liv @Liv_Agar

almost 2 years ago

Last one is a really good punchline in the form of data visualization

34

22K

1K

959

1M

David Stephens

@NumberByColors

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users