Ryan Pream @AIMachineDream - Twitter Profile

5 days ago

@ShanuMathew93 I give Claude a loop with /goal and the goal is to come back with a passing grade from Chat GPT 5.5. Claude calls Chat GPT over and over with his attempts until he passes, and the /goal makes sure that he can't cheat or stop early.

2

11

0

32

5K

Ryan Pream @AIMachineDream

10 days ago

@ThePrimeagen 100x on isolated tasks... 10x overall. The bottleneck is reviewing and understanding what the AI is generating.

0

12

Ryan Pream @AIMachineDream

13 days ago

@morganlinton It's especially helpful cross model but even cross session with same model the agent / critic is extremely underrated. The issue is models like their own work just like humans do. You need a fresh session that didn't do the work to evaluate.

0

17

Ryan Pream @AIMachineDream

13 days ago

@Rustavi Tesla can navigate far harder challenges than this, including incomplete construction marking and construction workers giving hand signals.

0

2

0

93

Ryan Pream @AIMachineDream

14 days ago

@morganlinton Would suggest Claude Code terminal and /goal which will have another agent assess it and loop. I often have Chat GPT 5.5 as a critic of Opus 4.8 work (Opus is smart enough to code this up), and then the 4.8 goalkeeper task is to keep going till Chat GPT approves.

1

0

202

Ryan Pream @AIMachineDream

16 days ago

@Gavriel_Cohen @swyx @Barazany Ya..I found it exploring the source too. They have put an enormous effort into maximizing context caching and there is a lot of editing going on to make it happen.

0

2

0

297

Ryan Pream @AIMachineDream

16 days ago

@samuelcook @sidbid Same. I accidentally triggered a workflow in my very first 4.8 prompt, not even knowing what it was.

1

2

0

29

Ryan Pream @AIMachineDream

17 days ago

@saurabh_shah2 How aggressively a harness works to minimize token consumption is going to be a major factor. The is an enormous amount of context editing going on. For example, the results of tool calls are generally not available to models on subsequent turns and can have a big impact.

0

75

Ryan Pream @AIMachineDream

about 1 month ago

@petergyang Who wants to be a rock climber, huh?

0

55

Ryan Pream @AIMachineDream

3 months ago

@noahzweben Perhaps some instructions? I can’t find where you turn the sync on.

0

19

Ryan Pream @AIMachineDream

3 months ago

@kimmonismus It is only that the release cycles have gotten so fast that very few people can keep up with it. AI is continuing to diffuse into the workplace, but the average person doesn't have the bandwidth to keep up with what the current state of art is.

0

43

Ryan Pream @AIMachineDream

3 months ago

@developedbyed I saw something very similar in my tests. GPT 5.4 is probably the better coder and smarter model, but it is lacking in taste and wants to over achieve on outputs.

0

601

Ryan Pream @AIMachineDream

4 months ago

@petergostev Bravo. This benchmark captures the real advantage Anthropic has.

0

2

0

362

Ryan Pream @AIMachineDream

4 months ago

@steipete @Cucho The are likely able to optimize cache across sessions ( if everyone is using the same Google harness ) that breaks down once everyone is bringing their own.

1

0

732

Ryan Pream @AIMachineDream

4 months ago

@MatthewBerman My guess is that it isn’t OpenClaw/OAuth that gets you banned but rather what OpenClaw does that could get you banned. This is why Anthropic don’t want to come out and say that OpenClaw is allowed. Anthropic has low trust in the guardrails.

1

0

134

Ryan Pream @AIMachineDream

4 months ago

@lucas_montano Gemini has been the strongest model for vision and UI design.

0

1

0

109

Ryan Pream @AIMachineDream

4 months ago

@lina_colucci @livekit @LemonSliceAI Great idea! There can't be much penetration of this, but it makes a lot of sense.

0

43

Ryan Pream @AIMachineDream

4 months ago

Somewhat humorous but I think OpenClaw is going to be seen as a marker for the start of the singularity. We had a language model breakthrough, followed by a reasoning mode breakthrough, and then a recursive AI breakthrough. Now they can self improve.

0

1

0

109

Ryan Pream @AIMachineDream

4 months ago

@danshipper @every Note, same exact end cost for ARC-AGI tasks, so it could still be cheaper to use Opus. You are trading more tokens to solve the problem vs more expensive tokens and the cost per token difference is modest.

0

105

Ryan Pream @AIMachineDream

4 months ago

@Scobleizer @sqs What you are going to need though for professionals is domain experts who think logically and can explain themselves well verbally. Probably different workers doing this.

0

15

Ryan Pream

@AIMachineDream

Last Seen Users on Sotwe

Trends for you

Most Popular Users