azim

Verified account

@_azims

software engineer trying to automate myself out of a job

West Orange, NJ

Joined November 2016

161 Following

82 Followers

600 Posts

5 days ago

@mitchellh I do something similar with context windows. Don’t use the 1M models, stick to the 256k variants. If it hits a compaction, decompose and try again. I think this is better because a small LOC change could still require a lot of background work / verification / thinking.

0

1

0

0

97

_azims retweeted

Mitchell Hashimoto

6 days ago

My heuristic is that any diff an agent generates over ~1500 lines is too big and is indicative that the problem needs to be decomposed. This is my general pattern now for feature work: 1. Try to implement the whole feature, loosely guided. I call this the "draw the owl" prompt in reference to the meme. Expect garbage, you're going to get garbage. 2. If the diff is less than 1500 lines, review it and iterate normally. If the diff is more than 1500 lines, prompt the agent to decompose the problem into atomic, incremental, reviewable tasks. Simultaneously, do this yourself. 3. Agents will very often make these tasks way too specific to the shape they solved. You need to massage it into the right general shape. Do that. 4. Kick off new agents to work on those incremental things (as parallelized as possible). Apply the same rules. 5. At a certain, point, repeat the "draw the owl" prompt. At some point, you will get beneath your review-ability threshold. This has been producing consistently high quality, maintainable, reviewable chunks of code that have a good handoff to either merge as-is or human refinement. And with the latest frontier models at xhigh thinking, these are all slow enough that you can usually have multiple going concurrently while you are actively reviewing others or working on your own tasks. HITL (human-in-the-loop) agents are still super important, especially for feature work. Features touch the human boundary in terms of UI, API, etc. And net new stuff can introduce pathologies in the architecture that violate desired invariants (these should be represented in specs or tests but we aren't perfect!). I know a lot of the leading edge agentic discourse is about "loops" and agents driving agents continuously. I do some of that (will report on that later). But, in terms of raw daily get-shit-done type of work, this is my most rewarding pattern at the moment.

97

4K

232

3K

206K

11 days ago

I can't wait for the vibe coding hangover on June 23rd

0

0

0

0

14

12 days ago

seems pretty smart to me

_azims's tweet photo. seems pretty smart to me https://t.co/070CydPXTU

12 days ago

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

5K

105K

15K

22K

56M

0

0

0

0

10

Who to follow

Your AI control plane. Safely scale AI across your org. Connect, secure, and monitor AI in real time. Every MCP, skill, CLI and agent session governed.

Nolan Di Mare Sullivan

Verified account

Building dev infra @SpeakeasyDev, battling snails, and reading books. Previously Product manager @LiveRamp, CS & Mandarin @UMich

Verified account

@fakemichaelrado

Building @smokestudioai the human-first AI workspace for founders, batteries included. @thirtymadison A to D, co-founder @photonhealth 0 to A. artist, dad.

_azims retweeted

2 months ago

Imagine the alternate reality where we named GPT-5.4-Pro something like Fable.

299

4K

181

243

1M

13 days ago

@jturntdev Dude have you tried 5.4-mini lately? Even less quota drain, and faster.

0

0

0

0

112

13 days ago

Of the over-used LLM phrases, "you're right, the honest answer is..." is the most infuriating. I hate it so much more than "you're absolutely right"

0

0

0

0

15

14 days ago

@niccruzpatane The craziest part is how much better they're getting each generation.

0

0

0

0

1K

16 days ago

@theo This exactly. Even keep separate versions of most skills.

0

0

0

0

272

17 days ago

@zebassembly I bet it's a skill :)

0

1

0

0

11

17 days ago

@petergyang hot take: keep separate copies. they behave so differently anyways it's worth specializing the prompts.

1

1

0

0

412

19 days ago

@CBSNews The pass rate for the disability insurance question was so low, I wonder if there’s something else going on here.

1

0

0

1

15K

21 days ago

@thsottiaux the excessive gpu usage gah daym

1

3

0

0

529

22 days ago

estimating based on my usage rates today, you can run gpt-5.5 in codex continuously for ~40 hours per week on the $100 plan (once the 2x bonus usage ends). that's pretty wild.

0

0

0

0

32

_azims retweeted

Garrett Scott 🕳

@thegarrettscott

27 days ago

Ya, if it was the Apple car and priced for scale this thing would go triple platinum.

thegarrettscott's tweet photo. Ya, if it was the Apple car and priced for scale this thing would go triple platinum. https://t.co/EbJ18tQkc0

172

6K

248

695

976K

26 days ago

The backlash to the new electric Ferrari is fascinating because it proves how much tacit design literacy ordinary people have.

0

0

0

0

13

_azims retweeted

30 days ago

https://t.co/AedF8Knlqo

19

535

52

1K

182K

about 1 month ago

one wrinkle....

0

0

0

0

8

about 2 months ago

the diff line counts still don't seem to work but there's pets now

OpenAI Developers

about 2 months ago

Pets. Now in Codex. Use /pet to wake your pet.

804

9K

798

3K

3M

0

0

0

0

23

about 2 months ago

I see a lot of people claiming agent workflow wrappers are useless, that modern claude / gpt can handle long running tasks on their own across compactions. Today I tried again with 5.5 and had to throw out the branch. The task I gave it isn't even that uncommon - remove an abstraction layer in a 3 tier system. Requires some changes at the DB layer, service / API layer, and frontend. Should be under 5k LOC change, probably closer to 3k, mostly removals. I went into planning mode with 5.5 and defined what I wanted precisely. When it went into implementation it immediately reduced the scope by putting in compat shims on the frontend and DB (in the same context window as the planning!) I suspect this tendency is rooted in the RL method. These models are so heavily reinforced to solve the problem in a single context window that they start making bad judgements as they approach the end of the context just to get to some state they can classify as a win. This is where workflow wrappers come in handy - only delegate tasks to agents that can actually be accomplished in a single context window. In my home grown implementation I actually completely throw out the work if it hits compaction and scope it down further.

0

0

0

0

35

Last Seen Users on Sotwe

Trends for you

Most Popular Users