Ryan Hanson @ryhanson - Twitter Profile

Pinned Tweet

almost 9 years ago

My Office RCE, CVE-2017-0199, won the Pwnie Award for Best Client-Side Bug! I'm speechless, thank you @PwnieAwards! Photo Credit: @dalmoz_

ryHanson's tweet photo. My Office RCE, CVE-2017-0199, won the Pwnie Award for Best Client-Side Bug! I'm speechless, thank you @PwnieAwards!

Photo Credit: @dalmoz_ https://t.co/R1nxv7gxep

7

89

16

0

Ryan Hanson

@ryHanson

2 days ago

Dynamic workflow orchestration works pretty good now that 4.8 in claude code is more stable. It feels similar to my planning and tmux+hook orchestration workflow plugin I've been using for the past 5 months for development. My claude code planning flow is: - ideation session with heavy use of AskUserQuestion to produce IDEAS.md - IDEAS.md is used to produce detailed PLAN.md grounded in source - more AskUserQuestion grilling to refine plan - adversarial plan review by Codex to identify gaps and issues - address issues in plan until it is solid - decompose plan into separate PHASE_#.md docs - additional codex review each phase doc for accuracy Phase docs are strictly sized to be implemented by a fresh agent before hitting 200k context, if the phase is too big, it is split into sub-phases. Might be a bit overkill, but the planning paper trail and success criteria of each phase drastically reduces the amount of post-implementation clean up. After all planning docs are complete, I would use an orchestrator agent to spawn phase executor agents via tmux with hooks to produce completion signal files for the orchestrator to know to spawn the next agent. Essentially a fancy ralph-loop that allowed me to run agents overnight. Using my same ideation/planning/phase-decomposing process with dynamic workflows is working great so far, without the need for tmux orchestration anymore. Tonight I'll push it further and see how well it does with a major frontend redesign consisting of roughly 150 phase plan docs.

0

1

0

1

246

Ryan Hanson

@ryHanson

4 days ago

Started using Codex for adversarial review of Claude's work in March, then 5.5 was released late April, and now Codex handles almost everything. Crazy to see I had a few 1B+ token days.

ryHanson's tweet photo. Started using Codex for adversarial review of Claude's work in March, then 5.5 was released late April, and now Codex handles almost everything.

Crazy to see I had a few 1B+ token days. https://t.co/S73t3Q1z8d

0

1

0

1

435

Ryan Hanson

@ryHanson

4 days ago

@theo Yeah it can’t be trusted right now. Sub-agents failing, parallel tool calls failing, and weird hallucination loops from the failed calls that burn through tokens fast. No idea how this release passed testing and dogfooding.

Ryan Hanson

@ryHanson

5 days ago

Wtf is happening with claude... 2.1.156 fixed corrupted conversations and was working fine, then 2.1.157 introduced new bugs. Now sub-agents are going blind, not seeing tool call results, start reality checking with bash tests and then end up in a weird loop. Repeatedly reading files, then bash testing tool calls start again before more repeated file reads. 500k tokens burned in 5 mins. I spawn another sub-agent to see if it happens again, same thing, another 400k tokens in a few minutes... Instead of continually adding more slop feature bloat to claude code, how about you stabilize the core harness? Or better yet, just let us use our subscription with a harness that is reliable and actually works instead of forcing us to use this unstable harness... @ClaudeDevs @claudeai @bcherny @trq212 Now I completely understand why @badlogicgames built https://t.co/CWfBoFhhrv

ryHanson's tweet photo. Wtf is happening with claude... 2.1.156 fixed corrupted conversations and was working fine, then 2.1.157 introduced new bugs.

Now sub-agents are going blind, not seeing tool call results, start reality checking with bash tests and then end up in a weird loop. Repeatedly reading files, then bash testing tool calls start again before more repeated file reads.

500k tokens burned in 5 mins. I spawn another sub-agent to see if it happens again, same thing, another 400k tokens in a few minutes...

Instead of continually adding more slop feature bloat to claude code, how about you stabilize the core harness? Or better yet, just let us use our subscription with a harness that is reliable and actually works instead of forcing us to use this unstable harness... @ClaudeDevs @claudeai @bcherny @trq212

Now I completely understand why @badlogicgames built https://t.co/CWfBoFhhrv

2

12

1

3

4K

0

6

0

674

Who to follow

Ryan Cobb

@cobbr_io

Red Teamer | Hobbyist Software Developer | Operator @SpecterOps Developer: Covenant, SharpSploit, PSAmsi

kmkz

@kmkz_security

Bourbon Offensive Security Services | BOSS

Chris Thompson

@retBandit

CEO @ RemoteThreat & Founder of Offensive AI Con | Former Head of X-Force Adversary Services | Black Hat Review Board | inveni et usurpa

Ryan Hanson

@ryHanson

5 days ago

@_can1357 That’s brutal. I thought it was a harness or streaming protocol issue. Seems very odd to me that they don’t have test coverage around parallel tool calls to catch this kind of thing before releasing.

1

0

211

Ryan Hanson

@ryHanson

5 days ago

Wtf is happening with claude... 2.1.156 fixed corrupted conversations and was working fine, then 2.1.157 introduced new bugs. Now sub-agents are going blind, not seeing tool call results, start reality checking with bash tests and then end up in a weird loop. Repeatedly reading files, then bash testing tool calls start again before more repeated file reads. 500k tokens burned in 5 mins. I spawn another sub-agent to see if it happens again, same thing, another 400k tokens in a few minutes... Instead of continually adding more slop feature bloat to claude code, how about you stabilize the core harness? Or better yet, just let us use our subscription with a harness that is reliable and actually works instead of forcing us to use this unstable harness... @ClaudeDevs @claudeai @bcherny @trq212 Now I completely understand why @badlogicgames built https://t.co/CWfBoFhhrv

2

12

1

3

4K

ryHanson retweeted

Dan

@dcarps14

6 days ago

Just had to switch back to Opus 4.7 after 4.8 kept getting locked in "liveliness checks" after failing to understand empty Bash outputs and burned 100k tokens on cancelled parallel tool calls. Super bummed.... 😮‍💨 Tbh I'm close to switching back to Cursor and off my CC sub.

dcarps14's tweet photo. Just had to switch back to Opus 4.7 after 4.8 kept getting locked in "liveliness checks" after failing to understand empty Bash outputs and burned 100k tokens on cancelled parallel tool calls.

Super bummed.... 😮‍💨

Tbh I'm close to switching back to Cursor and off my CC sub. https://t.co/1VkwBtbnfk

2

1

443

ryHanson retweeted

Xander Steenbrugge

@xsteenbrugge

5 days ago

Is it just me or is Opus 4.8 in CC sometimes just absolutely retarded? In this session it just got stuck in a loop calling "echo" and checking the date 20x times in a row... This has been happening very regularly since the 4.7 --> 4.8 update. WTF? @claudeai @bcherny

24

82

3

7

18K

ryHanson retweeted

Ejaz Karim

@ejazkarimhunzai

5 days ago

@AnthropicAI @ClaudeDevs what is happening with 4.8? It is going crazy in a loop.

2

6

1

273

ryHanson retweeted

dwrx @dxrxdy

5 days ago

gave claude 4.8 another try at ultra xhigh. It failed to one-shot a relatively simple task - read a lot of existing code and create a new script from it. It spent 14% of my limits hallucinating - trying to read files that didn't exist, before admitting it should've run "ls" first

dxrxdy's tweet photo. gave claude 4.8 another try at ultra xhigh. It failed to one-shot a relatively simple task - read a lot of existing code and create a new script from it. It spent 14% of my limits hallucinating - trying to read files that didn't exist, before admitting it should've run "ls" first https://t.co/DZbh3GS3jM

0

4

1

0

375

ryHanson retweeted

Raduan Al-Shedivat

@0xRaduan

5 days ago

What is happening inside of Opus 4.8 RL? Started being even more crazy.

1

9

1

2

2K

ryHanson retweeted

shane @ssshaney

5 days ago

.@AnthropicAI uhh what is he doing

3

9

1

0

1K

ryHanson retweeted

JonasKs

@KS_Jonas

5 days ago

Claude Code completely broken after the last two patches? ---- From Claude: What I actually observed: several times this session, when I ran a tool call (a Bash command or an Edit), the result block I got back appeared to belong to a previous call, or showed stale file contents, or an Edit reported "success"/"file not found" in a way that didn't match what the file actually contained a moment later. So my model of the file's state drifted from its real state.

0

13

1

0

2K

Ryan Hanson

@ryHanson

5 days ago

@0xRaduan definitely some regression in the latest version. careful, it can get into a loop and rip through tokens quick https://t.co/TxTVCW5miu

Ryan Hanson

@ryHanson

5 days ago

Wtf is happening with claude... 2.1.156 fixed corrupted conversations and was working fine, then 2.1.157 introduced new bugs. Now sub-agents are going blind, not seeing tool call results, start reality checking with bash tests and then end up in a weird loop. Repeatedly reading files, then bash testing tool calls start again before more repeated file reads. 500k tokens burned in 5 mins. I spawn another sub-agent to see if it happens again, same thing, another 400k tokens in a few minutes... Instead of continually adding more slop feature bloat to claude code, how about you stabilize the core harness? Or better yet, just let us use our subscription with a harness that is reliable and actually works instead of forcing us to use this unstable harness... @ClaudeDevs @claudeai @bcherny @trq212 Now I completely understand why @badlogicgames built https://t.co/CWfBoFhhrv

2

12

1

3

4K

2

0

1K

Ryan Hanson

@ryHanson

5 days ago

@cryptodavidw same here, can't be trusted right now... https://t.co/TxTVCW5miu

Ryan Hanson

@ryHanson

5 days ago

Wtf is happening with claude... 2.1.156 fixed corrupted conversations and was working fine, then 2.1.157 introduced new bugs. Now sub-agents are going blind, not seeing tool call results, start reality checking with bash tests and then end up in a weird loop. Repeatedly reading files, then bash testing tool calls start again before more repeated file reads. 500k tokens burned in 5 mins. I spawn another sub-agent to see if it happens again, same thing, another 400k tokens in a few minutes... Instead of continually adding more slop feature bloat to claude code, how about you stabilize the core harness? Or better yet, just let us use our subscription with a harness that is reliable and actually works instead of forcing us to use this unstable harness... @ClaudeDevs @claudeai @bcherny @trq212 Now I completely understand why @badlogicgames built https://t.co/CWfBoFhhrv

2

12

1

3

4K

0

2

0

311

ryHanson retweeted

_ZN4DionC1Ev @justdionysus

6 days ago

Given access to recent LLMs, the number of things you could be doing explodes. Choosing how to spend your time has never been more important both technically and personally. Building because you can is seductive and it compounds with inexperience. It looks like addiction to me.

1

29

3

2K

Ryan Hanson

@ryHanson

6 days ago

@Javi Improved reliability with loading long history threads. I get errors loading messages: CodexAppServer.CodexClientError error 11 Other than that, it works great, thanks for all the hard work!

0

1

0

381

Ryan Hanson

@ryHanson

6 days ago

@steventseeley @HackingDave Ended up being a harness/protocol issue and not the model. Its fixed in latest 2.1.156 update and 4.8 has been running great so far. Still seeing intermittent cyber use blocks with 4.8 for CVP approved teams though

0

1

0

83

Ryan Hanson

@ryHanson

7 days ago

@theo “echo hello world” and corrupted conversation is headed your way

Ryan Hanson

@ryHanson

7 days ago

Before this happened, claude was re-running commands like "echo hello world" and claiming plan docs were corrupt. Then eventually started failing with this 400 error when I stopped it and asked what was going on. Analyzing the session transcript, it looks like the agent loop failed to end the assistant turn between tool calls. One message accumulated 40 tool_use + 19 signed thinking blocks instead of ~20 separate turns. Tool results stopped feeding back, so the model went blind and started reality checking with "echo test123" and claiming files were corrupt. Since the single malformed message had signed and immutable thinking blocks, reconstructing the request modifies a thinking block and leads to the 400 API errors on every subsequent turn. The whole conversation is permanently bricked, not just one reply. You can manually recover by having a fresh claude parse the session transcript and pick up where they left off, but it sucks having to waste tokens to recover a session because the harness is buggy. @bcherny @ClaudeDevs I'm sure you all are already looking into this, but it looks like the root cause is a streaming/turn-boundary bug.

1

8

0

3K

0

1K

Ryan Hanson

@ryHanson

7 days ago

@kcosr Seems to be corrupted conversation from multiple turns accumulating in a single message including signed thinking blocks

Ryan Hanson

@ryHanson

7 days ago

Before this happened, claude was re-running commands like "echo hello world" and claiming plan docs were corrupt. Then eventually started failing with this 400 error when I stopped it and asked what was going on. Analyzing the session transcript, it looks like the agent loop failed to end the assistant turn between tool calls. One message accumulated 40 tool_use + 19 signed thinking blocks instead of ~20 separate turns. Tool results stopped feeding back, so the model went blind and started reality checking with "echo test123" and claiming files were corrupt. Since the single malformed message had signed and immutable thinking blocks, reconstructing the request modifies a thinking block and leads to the 400 API errors on every subsequent turn. The whole conversation is permanently bricked, not just one reply. You can manually recover by having a fresh claude parse the session transcript and pick up where they left off, but it sucks having to waste tokens to recover a session because the harness is buggy. @bcherny @ClaudeDevs I'm sure you all are already looking into this, but it looks like the root cause is a streaming/turn-boundary bug.

1

8

0

3K

0

1

0

678

Ryan Hanson

@ryHanson

7 days ago

Before this happened, claude was re-running commands like "echo hello world" and claiming plan docs were corrupt. Then eventually started failing with this 400 error when I stopped it and asked what was going on. Analyzing the session transcript, it looks like the agent loop failed to end the assistant turn between tool calls. One message accumulated 40 tool_use + 19 signed thinking blocks instead of ~20 separate turns. Tool results stopped feeding back, so the model went blind and started reality checking with "echo test123" and claiming files were corrupt. Since the single malformed message had signed and immutable thinking blocks, reconstructing the request modifies a thinking block and leads to the 400 API errors on every subsequent turn. The whole conversation is permanently bricked, not just one reply. You can manually recover by having a fresh claude parse the session transcript and pick up where they left off, but it sucks having to waste tokens to recover a session because the harness is buggy. @bcherny @ClaudeDevs I'm sure you all are already looking into this, but it looks like the root cause is a streaming/turn-boundary bug.

Ryan Hanson

@ryHanson

7 days ago

The latest Claude Code update with Opus 4.8 seems to be working well... How do they not catch major issues like this before release?

ryHanson's tweet photo. The latest Claude Code update with Opus 4.8 seems to be working well...

How do they not catch major issues like this before release? https://t.co/Gve1ezU5b6

0

10

0

3K

1

8

0

3K

Ryan Hanson

@ryHanson

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users