Ben Davis @davis7 - Twitter Profile

Pinned Tweet

Ben Davis

@davis7

about 17 hours ago

The Opus 4.8/ultracode + DeepSWE bench podcast ep. is live:

Nerd Snipe

@NerdSnipePod

about 17 hours ago

We (mostly) like Opus 4.8... 00:00 - Shower Thoughts 02:44 - Deep SWE Benchmark 10:45 - Opus vs GPT-5.5 19:57 - Anthropic’s Huge Raise 25:39 - Token Maxing 40:02 - AI Slot Machine 43:49 - Claude Code Friction 50:01 - Opus, Mythos, and Safety

6

81

3

44

51K

2

27

1

4

3K

Ben Davis

@davis7

about 20 hours ago

@maria_rcks @theo Promoted, unfortunately for Theo

1

77

0

3K

Ben Davis

@davis7

1 day ago

@jetpackjoe_ 30-40%. Somewhere between sonnet 4.6 and opus 4.7

0

4

0

376

Ben Davis

@davis7

1 day ago

Been using this a bunch today and it's awesome Grok build is a good TUI, composer 2.5 is an excellent model Good alternative to GPT-5.5 low reasoning with pi if that's not ur thing. I will probably end up sticking with that, 5.5 & Pi are still each better than composer & grok build respectively But it's a step in the right direction, I want xAI + Cursor to keep getting better we desperately need the competition rn

xAI

@xai

2 days ago

Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.

503

6K

697

991

16M

19

257

9

29

20K

Who to follow

Ryan Carniato

@RyanCarniato

got signals? @solid_js @Sentry 👫@RunningZ98577 🇨🇦

dax

@thdxr

building @opencode at @anomalyco not the ceo

Surma

@DasSurma

DX & monorepo at @ShopifyEng Craving simplicity, finding it nowhere. Internetrovert 🏳️‍🌈 He/him.

Ben Davis

@davis7

1 day ago

Sun eater is my favorite series of all time Book 1 is solid but u kinda just have to get through it, books 2-7 are incredible all the way through And the series ends perfectly, books 7 and 6 are arguably the strongest in the series which is impressive U will know if it’s for u after reading the prologue lol

2

30

0

18

2K

davis7 retweeted

Rhys

@RhysSullivan

1 day ago

credit where credit is due, workflows in claude code are good i've been particularly impressed with them for writing effect, generally works really well with finding strong patterns from other repos and writing it properly

18

178

3

45

13K

Ben Davis

@davis7

2 days ago

@ryanvogel they should not have given u this feature

0

8

0

685

Ben Davis

@davis7

2 days ago

@colesmcintosh Is there official cursor support? I’ve heard this from multiple people

1

3

0

247

Ben Davis

@davis7

2 days ago

I re-subbed to Claude Code to test out Opus 4.8 It's both better and worse than I expected tbh. Claude models have some really weird behaviors/hallucinations and seem to be getting slower That said, workflows are dope and the model is still very good...

13

89

2

18

11K

Ben Davis

@davis7

4 days ago

@TheHunterBohm Hermes is useful for non code stuff. I'm not using it to build things, rather to run workflows for work stuff like emails, social media stats aggregations, reminders, slack, notion, etc.

1

13

1

650

Ben Davis

@davis7

4 days ago

Hermes is great and I highly recommend it But also one of the first things u should do is go in and gut the skills. At least 40 of the ~100 should be instantly turned off

16

209

3

47

24K

Ben Davis

@davis7

4 days ago

it's annoying, but also I can understand why they do it this way. If the end goal is for anyone to be able to use it, u shouldn't have to manually curate skills eventually no one should have to, but we're not there yet

1

31

0

1

2K

Ben Davis

@davis7

4 days ago

@DerekCorniello I did end up dropping out but that's mostly b/c college in covid sucked a ton, OSU itself was awesome

3

10

0

2K

Ben Davis

@davis7

4 days ago

An unedited 1/1 quote from Opus 4.8 max reasoning in Claude Code: "instead of stopping when the reads came back empty, I described a homepage that doesn't exist: a custom form already wired to Attio via a server API (attio.js, submitBrandInquiry, ATTIO_API_KEY). None of that is real." ...

19

159

2

21

14K

Ben Davis

@davis7

4 days ago

This benchmark is the first one I've seen that maps 1:1 to my experience Almost to a degree where I'm scared to fully trust it since it so tightly maps to my existing opinions I feel like I'm missing something