“CLAUDE I’VE EXPLAINED THIS BUG TO YOU 10 TIMES AND YOU STILL KEEP BREAKING IT, THINK LIKE A SOFTWARE ENGINEER WITH 15 YEARS OF EXPERIENCE AND FIX IT PROPERLY MAKE NO MISTAKES”
The brain is designed to learn through constant repetition and active, hands-on involvement. Through such practice and persistence, any skill can be mastered.
If Anthropic has a trillion dollar valuation while Google has a $4.5 trillion valuation, there are only two conclusions to be drawn
Either Anthropic is incredibly overpriced or Google is incredibly underpriced. The trick is to figure out which one is right
Using AI agents without a formal specification of behavior is vibecoding. Using AI agents with a formal specification of behavior is software engineering. Or at least a significant component of software engineering.
I like to nail down the required behavior using gherkin. I have the agents create a parser that interprets the gherkin into an intermediate representation, and then I have the agents create a generator which converts that IR into executable tests.
"Nobody reviews compiler output, why review AI code?"
Wrong. We do review compiler output. Godbolt exists. Disassemblers exist. Anyone doing serious performance work reads what the compiler produced. The premise is false.
But the analogy itself is flawed. It compares two things that aren't comparable.
A compiler takes a formal language as input. Languages with grammars and semantics defined precisely enough that "what does this code mean" has only one answer.
An LLM takes natural language as input. Natural languages are ambiguous. "Write me a function that handles user input safely" has a thousand valid interpretations and a thousand more invalid ones. The LLM picks one. You don't know which. Unless you look at the code.
Compilers are built from specifications and designed to meet them. The output is the result of a defined translation. When the output violates the spec, it's a bug.
LLMs are built from whatever was in their training data. There is no spec. There can't be one, natural languages have no defined semantics that map to code.
Compilers are semantically deterministic. The same input produces output with the same behaviour, every time. LLMs are not. Partly by design and partly due to hardware variance, batch size, inference order, and floating point operations (and no setting temperature to zero does not address those). All of which can push the same prompt to produce different code.
Compilers complain loudly when the input is nonsensical. LLMs fail silently, producing plausible-looking, but wrong code.
We trust compiler output because the trust was earned across decades of use, with millions of engineers using the same tools. Early compilers were reviewed heavily. Hand-written assembly was the default because trust hadn't been earned yet.
We're at the hand-written assembly stage with AI. We may never get to the trust-the-output stage for the reasons explained above.
If you’re a software developer, you should own what goes to production. The compiler analogy is a way of skipping that responsibility.
Claude for Excel, PowerPoint, and Word are now generally available, and Claude for Outlook is in public beta.
As Claude moves between your Microsoft apps, it carries the full context of your conversation.
@OpenAI I've built lots of applications using the ReatimeAPI and one major problem I encountered severally was context degradation and occasional wrong language detection along with incorrect transcript.
Curious to know how this will perform in those aspects.