Ryan Pream @aimachinedream - Twitter Profile

about 3 hours ago

@TheZvi The fix is a classifier that stops Mythos from fixing any bugs in software code. That is a problem when your main business is software coding agents.

0

1

0

107

Ryan Pream @AIMachineDream

about 4 hours ago

@rikkarth Depends a lot on how hard the project is. I was working on a 3D renderer and I went from Fable getting 8/10 prompts right to Opus 1/10 right. I've switching to using ultra code with 40+ agents per prompt but Opus is still having a lot of trouble.....

0

1

0

60

Ryan Pream @AIMachineDream

about 5 hours ago

@spicey_lemonade There is just no way for a classifier approach to 100% tell if the user is asking for a bug fixed or asking to identify a bug for purposes of exploit.

0

51

Ryan Pream @AIMachineDream

about 6 hours ago

@Youssofal_ I've gone to doing a ultra code workflow at almost every prompt in the conversation to allow Opus 4.8 to continue work on the project I was working on with Fable. Where as Fable could go correction free, Opus "thinking" is generally wrong and requires a 30-50 agent workflow.

0

73

Ryan Pream @AIMachineDream

about 7 hours ago

@kimmonismus The jailbreak is essentially "find bugs in my codebase". It you disable that ability via classifier then you prevent the use of Fable for coding. Is a real tension because cyber vulnerabilities are bugs and fixing bugs is a huge part of what coding agents do.

0

10

0

361

Ryan Pream @AIMachineDream

6 days ago

@ShanuMathew93 I give Claude a loop with /goal and the goal is to come back with a passing grade from Chat GPT 5.5. Claude calls Chat GPT over and over with his attempts until he passes, and the /goal makes sure that he can't cheat or stop early.

2

11

0

32

5K

Ryan Pream @AIMachineDream

10 days ago

@ThePrimeagen 100x on isolated tasks... 10x overall. The bottleneck is reviewing and understanding what the AI is generating.

0

12

Ryan Pream @AIMachineDream

13 days ago

@morganlinton It's especially helpful cross model but even cross session with same model the agent / critic is extremely underrated. The issue is models like their own work just like humans do. You need a fresh session that didn't do the work to evaluate.

0

17

Ryan Pream @AIMachineDream

13 days ago

@Rustavi Tesla can navigate far harder challenges than this, including incomplete construction marking and construction workers giving hand signals.

0

2

0

93

Ryan Pream @AIMachineDream

14 days ago

@morganlinton Would suggest Claude Code terminal and /goal which will have another agent assess it and loop. I often have Chat GPT 5.5 as a critic of Opus 4.8 work (Opus is smart enough to code this up), and then the 4.8 goalkeeper task is to keep going till Chat GPT approves.

1

0

202

Ryan Pream @AIMachineDream

16 days ago

@Gavriel_Cohen @swyx @Barazany Ya..I found it exploring the source too. They have put an enormous effort into maximizing context caching and there is a lot of editing going on to make it happen.

0

2

0

297

Ryan Pream @AIMachineDream

16 days ago

@samuelcook @sidbid Same. I accidentally triggered a workflow in my very first 4.8 prompt, not even knowing what it was.

1

2

0

29

Ryan Pream @AIMachineDream

17 days ago

@saurabh_shah2 How aggressively a harness works to minimize token consumption is going to be a major factor. The is an enormous amount of context editing going on. For example, the results of tool calls are generally not available to models on subsequent turns and can have a big impact.

0

75

Ryan Pream @AIMachineDream

about 1 month ago

@petergyang Who wants to be a rock climber, huh?

0

55

Ryan Pream @AIMachineDream

3 months ago

@noahzweben Perhaps some instructions? I can’t find where you turn the sync on.

0

19

Ryan Pream @AIMachineDream

3 months ago

@kimmonismus It is only that the release cycles have gotten so fast that very few people can keep up with it. AI is continuing to diffuse into the workplace, but the average person doesn't have the bandwidth to keep up with what the current state of art is.

0

43

Ryan Pream @AIMachineDream

3 months ago

@developedbyed I saw something very similar in my tests. GPT 5.4 is probably the better coder and smarter model, but it is lacking in taste and wants to over achieve on outputs.

0

601

Ryan Pream @AIMachineDream

4 months ago

@petergostev Bravo. This benchmark captures the real advantage Anthropic has.

0

2

0

362

Ryan Pream @AIMachineDream

4 months ago

@steipete @Cucho The are likely able to optimize cache across sessions ( if everyone is using the same Google harness ) that breaks down once everyone is bringing their own.

1

0

732

Ryan Pream @AIMachineDream

4 months ago

@MatthewBerman My guess is that it isn’t OpenClaw/OAuth that gets you banned but rather what OpenClaw does that could get you banned. This is why Anthropic don’t want to come out and say that OpenClaw is allowed. Anthropic has low trust in the guardrails.

1

0

134

Ryan Pream

@AIMachineDream

Last Seen Users on Sotwe

Trends for you

Most Popular Users