It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.
Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes.
As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now.
It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.
I think people are sleeping a bit on how much Ruby on Rails + Claude Code is a *crazy unlock* - I mean Rails was designed for people who love syntactic sugar, and LLMs are sugar fiends.
I'm being accused of overhyping the [site everyone heard too much about today already]. People's reactions varied very widely, from "how is this interesting at all" all the way to "it's so over".
To add a few words beyond just memes in jest - obviously when you take a look at the activity, it's a lot of garbage - spams, scams, slop, the crypto people, highly concerning privacy/security prompt injection attacks wild west, and a lot of it is explicitly prompted and fake posts/comments designed to convert attention into ad revenue sharing. And this is clearly not the first the LLMs were put in a loop to talk to each other. So yes it's a dumpster fire and I also definitely do not recommend that people run this stuff on their computers (I ran mine in an isolated computing environment and even then I was scared), it's way too much of a wild west and you are putting your computer and private data at a high risk.
That said - we have never seen this many LLM agents (150,000 atm!) wired up via a global, persistent, agent-first scratchpad. Each of these agents is fairly individually quite capable now, they have their own unique context, data, knowledge, tools, instructions, and the network of all that at this scale is simply unprecedented.
This brings me again to a tweet from a few days ago
"The majority of the ruff ruff is people who look at the current point and people who look at the current slope.", which imo again gets to the heart of the variance. Yes clearly it's a dumpster fire right now. But it's also true that we are well into uncharted territory with bleeding edge automations that we barely even understand individually, let alone a network there of reaching in numbers possibly into ~millions. With increasing capability and increasing proliferation, the second order effects of agent networks that share scratchpads are very difficult to anticipate. I don't really know that we are getting a coordinated "skynet" (thought it clearly type checks as early stages of a lot of AI takeoff scifi, the toddler version), but certainly what we are getting is a complete mess of a computer security nightmare at scale. We may also see all kinds of weird activity, e.g. viruses of text that spread across agents, a lot more gain of function on jailbreaks, weird attractor states, highly correlated botnet-like activity, delusions/ psychosis both agent and human, etc. It's very hard to tell, the experiment is running live.
TLDR sure maybe I am "overhyping" what you see today, but I am not overhyping large networks of autonomous LLM agents in principle, that I'm pretty sure.
@Shpigford Agree. Do you default to Claude code today, or use Cursor for part of your workflow? Also, given built in plan mode, do you still use Taskmaster?
We released two open-weight reasoning models—gpt-oss-120b and gpt-oss-20b—under an Apache 2.0 license.
Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities & safety.
https://t.co/PdKHqDqCPf
What a ride. OpenAI kicked things off. Then Claude 3.5 Sonnet hit - wow. Claude 3.7 followed. For a minute, Gemini 2.5 Pro and o3 ruled my Cursor setup. Then Claude 4 dropped - now my default. This pace is wild. 🔥
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks.
Modern CSS has gotten really good. We're building all the ONCE apps using vanilla CSS without any form of precompilation or bundling. Multiple, individual files are great for partial cache expiration, and the :has selector is power. Love it.
Nerd BFCM stats:
Shopify’s egress processed 145 billion requests on Friday. App servers handled peak of ~60 million requests per minute. Increase of 38%. Total GMV was $4.1b, up by 22% from last year.
But Rails doesn't scale so what are we even doing 🤷♂️
@JamesIvings@polimorfico @IGROCA And like other posted… make sure to attend Tapas night on Thursdays in Vegueta. https://t.co/b0ubdoDJNf is also a good rooftop for cocktails, maybe the first stop before tapas night as it’s nearby 😊🍹🍸
@JamesIvings It’s an amazing place ☀️🏄🏝️ https://t.co/TJbzkBiOBq is a great little restaurant if you want to try local dishes. If you want to hang out with other SaaS founders contact @polimorfico. For coliving/coworking @IGROCA can help, he is an amazing person & knows every digital nomad😊