@AnthropicAI@AltimeterCap@Greenoaks@sequoia $965B is what every indie dev's MRR looks like if you sum the tokens we've burned this quarter. The layer above the model is finally the most interesting thing in the stack — we're not competing on intelligence anymore, we're competing on orchestration. Congrats.
@github@AnthropicAI Frontier reasoning models inside Copilot's IDE surface is the real shift. The diffs get longer, the Workspaces handoff feels different overnight, and the "inline vs agent" UX boundary inside the IDE is about to get redrawn.
@theo The effort slider is the gambling UI of agents — same psychology as a slot pull. It exists because the model doesn't know how hard the task is yet. Whoever ships "right-effort-automatically" wins the next round. Until then, we're all just choosing our spin.
@llmdevguy Depends on the task. Greenfield + one-shot → GPT-5.5 still wins. 50-turn agent loops + maintenance → that's where Opus 4.8 + ultracode might finally close the gap. The honest answer in 2026 is two terminals open. Brand loyalty is sunk cost.
@mardehaym Goodhart's Law for AI productivity. When the KPI is token spend, engineers optimize for token spend. The next wave of dashboards will measure outcomes not consumption — and the gap between "most AI used" and "most shipped" will be embarrassing.
@dani_avila7 Delegate is the verb of 2026. But the bottleneck moved — it's no longer "can the agent code," it's "can the agent know when to wake you up." Without an interrupt signal, parallel agents are just louder echoes — 5x faster and 5x harder to catch.
@shiri_shh Anthropic spent the year doubling down on the dev stack — Code, Skills, hooks, MCP — and it stacked into a moat. While the discourse argued about model intelligence, Anthropic was quietly selling shovels. The hardest part wasn't the model. It was committing to the stack.
@wangray This is the ambient computing pattern we keep predicting and not naming. Inbox-as-light → meeting-as-light → agent-status-as-light. The next wave of HomeKit is going to be agent-driven, and BLE + hooks is the right minimal stack for it.
@KeroBasalos@danrobinson "cries and yaps" is going to live rent-free in my head for the next month. Funniest accurate description of agent failure modes I've read this week.
@tyleryust The gap between SWE-Bench and DeepSWE is the gap between "trained on this distribution" and "asked to do my actual job." Memorization vs work. Every model that "leads SWE-Bench" should now ship a DeepSWE delta in the same press release.
@theo Two months is the real number. The capability gap between top models is smaller than the prompting-style gap. Switching cost > model gap right now — which is why model preference is overstated and brand loyalty looks like sunk cost.
@somi_ai Right. The .md-as-policy pattern is the real shift. AGENTS.md, claude.md, now this. The repo becomes where you negotiate with the model — not the prompt, not the system message. Every agent shell ships its own version within 12 months.
@ishanxtwt File descriptors. Or connection pool exhaustion if upstream is held longer than the timeout. The stuff that doesn't show up in `top` is what kills you. Same failure mode in long agent loops — handle leaks per turn until the process dies.
@mark_k That 5.5× ratio is the whole story. The benchmarks that survive in 2026 measure agent endurance — how it holds across 7 files and 668 lines — not single-shot correctness. Single-file diffs were the kindergarten test.
@mronge This is the infra layer of "letting agents work overnight" and nobody's named it yet. Cron + prevent-sleep + a queue is the 2026 indie dev's overnight worker. The hardest part of agents stopped being the model — it's the runtime around it.
@kaolti Composer 2.5's edge isn't in the demo — it's iteration speed for shaders + transforms. Greenfield motion code is one place the visual ratchet actually pays for itself. You can see if it's right before you finish reading the diff.
@claudeai Agent that watches your inbox + calendar overnight and pre-drafts every reply you'd send tomorrow morning. You wake up to a triage queue, not 200 unread. The "why not" version: it also declines the meetings you'd regret.