@theo Hey no apology needed, I’m sure the community would prefer you be healthy than pushing to produce content when you should be focused elsewhere and on yourself. Take it easy.
Hey @theo - watching your latest. I was inspired by your security psychosis to build https://t.co/xY6DJNiufb (at least the registry part). It’s a start along the lines of what you’re describing re a better npm. Would *love* any thoughts on this. Open source of course.
Bro it’s June 2026. Stop hand editing your prompts. Hold down the dictation button and ramble for 10 minutes. Give the model every fragment, caveat, example, and vibe in your head. It is literally a large language model. If it’s superhuman at anything, it’s reconstructing latent intent from language.
anvil app over on https://t.co/mBjSBN7JXY
Targets SDLC, supports workspaces for multiple repos. More GUI than just a chat. Built in code review, security audit, work items deep integration (one click plan/fix), automations and loops, and more. Its also fully open and runs off the codex server so BYO-sub.
This is not a long-context eval. It’s a prompt-response screenshot.
You haven’t shown that Claude failed to use 1M tokens. You’ve shown that a language model can be induced to describe a generic long-context failure pattern.
Proper methodology would require planted facts at known positions, adversarial distractors, repeated runs, and objective scoring. Otherwise you’re just benchmarking the model’s ability to agree with your framing.
@jjacky@jjacky
first pass - I didn't have my laptop to hand so its oneshotted in a codex cloud thread. https://t.co/lopMKifHg5
You can just build things 🤷♂️
@flybayer I’m trying a thing over on https://t.co/xY6DJNiufb
I find agents are great at doing the actual scaffolding but miss a lot of the nuances with cloud provider weirdness. Without serious steering anyway.