David Chase @lazyeval - Twitter Profile

I was wrong I've been saying for months that open source AI models are 6 months behind frontier They caught up. GLM 5.2 is as good as Opus 4.8 This changes everything. If you run GLM 5.2 locally no government can take it away. You become sovereign And even if you run through APIs, its a fraction of the cost The battlefield is different now. If open source is as good as frontier, and people have cheaper alternatives, governments can't be as quick to regulate. It will destroy the frontier AI labs All of this is such a massive win for the people If you are not paying attention to local models yet, you are making a tremendous mistake

394

6K

534

4K

904K

0

48

Who to follow

ゆうき🍟（バボちゃん）

@yuki_0706id

釧路市出身の27歳社会人の男です！ AliA好きです！！趣味→ギター、音楽、プログラミング札幌のシンガーソングライター怜花さんを応援しています！

Da Cipher

@deoyeakintola

ripples on brainwaves talking

@rpominov

just normal man. just innocent man

David Chase

@lazyeval

2 days ago

@RobinhoodApp When multi leg options ?

0

83

David Chase

@lazyeval

3 days ago

This and AI models are going to drain my bank accounts really quickly 🤣

Josh Pigford

@Shpigford

3 days ago

just found out about https://t.co/uf4s7ArWK8 and i'm absolutely livid that i now have a new hobby that i DO NOT HAVE TIME FOR.

5

11

0

9

3K

0

1

0

29

David Chase

@lazyeval

3 days ago

@Shpigford pretty fun just started to play

0

1

0

27

David Chase

@lazyeval

3 days ago

This would be awesome 🤩

Elon Musk

@elonmusk

3 days ago

AI will achieve Stockfish-level coding and generalized computer use

3K

54K

6K

4K

15M

0

10

David Chase

@lazyeval

3 days ago

@Shpigford I haven’t seen on openrouter so where ?

0

43

David Chase

@lazyeval

3 days ago

@Shpigford Maybe get cursor models outside of cursor now ?

1

0

45

David Chase

@lazyeval

4 days ago

@morganlinton @smallharness Wow that would be amazing I didn’t expect things to move so quickly

0

1

0

7

David Chase

@lazyeval

5 days ago

@steipete zellij this is the way 🫡

0

1

0

788

David Chase

@lazyeval

5 days ago

@skirano I’m guessing this the codex 20x for that many possible agents doing goals ?

0

619

David Chase

@lazyeval

6 days ago

🤣

Jay

@jayair

6 days ago

Introducing Opus 4.9 — 10% smarter, 2x more expensive Definitely NOT Fable

91

12K

253

356

386K

0

20

David Chase

@lazyeval

6 days ago

@ScriptedAlchemy

1

2

0

5K

David Chase

@lazyeval

6 days ago

@NetworkChuck Can you say more, local models need more focus to do what ? Provide frontier level reasoning or lighter to run on various hardware or what ?

0

1

0

290

David Chase

@lazyeval

7 days ago

@AlexFinn Can you speak more to which model(s) are opus level that we can run on home GPUs ?

0

6

0

234

lazyeval retweeted

Artificial Analysis

@ArtificialAnlys

8 days ago

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

ArtificialAnlys's tweet photo. We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top

DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.

The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.

More below.

114

2K

185

411

564K

David Chase

@lazyeval

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users