@NathanWilbanks_ Tool output optimization was def a huge win but I tried doing some fancy continuous compaction stuff with the conversation history, and while it helped extend a sessions effective lifespan I probably fucked the cache with that. New project to tackle!
Spent a while iterating on a version of this process in my Pi harness, and I gotta say he was right.
My old setup was heavily human in the loop, because nothing I did nailed the balance of guardrails & autonomy to prevent a multiple hour run from turning into slop.
Now I spend an hour hashing out requirements in extreme detail, let the agent loop for 8-12 hours, and out pops beautiful, clean new features. Slop minimal.
This feels like a turning point, for me at least. I may be late to the party, but the party isn't over.
nobody wants to hear this but the classical NASA systems engineering is the perfect model for developing code with LLMs. people try to approximate this with planning modes, but if youโre explicit in your docs itโs never been easier to build, test, and verify complex codebases.
The future of game engines is models that train models to generate the target experience that are small enough to run on your nephews shitty best buy laptop.
@NathanWilbanks_ Agree re: engineering challenge at this point. Better context management was a big improvement for me before this new workflow, curious how you're handling caching though? Is that with subs or usage billing?
@redtachyon I think 10 is even conservative, given how much overhead increases as team sizes grow. Those 10 redditors need 1-2 managers, and they're sitting through hours of meetings every week.
@NathanWilbanks_ I've got a long run going today to build the evals for the project it started last night. I haven't managed to get parallelism ramped up without blowing through my usage limits though, so that's next in the queue.
@CursiveCrow I've assumed this whole discourse was about their applications in specific workflows, that weren't as effective with previous models/harnesses, not their existence. Am I missing something?
@profleonn Only time I ssh at this point is from phone to desktop to check in on agent sessions. Agents ssh to the servers for me & do whatever I need. Haven't hopped onto a server myself in months.
At <dayjob> I regularly argue with Claude about implementation details, because it just refuses to believe we could've built something so STUPID before AI.
The biggest QoL improvement with LLM use is I don't bother figuring out how to install a new tool. curl | bash, snap, deb, I DON'T CARE I JUST WANT TO OPEN A GODDAMN DOCX