Principal AI/ML Engineer · agentic systems that don't fall over in prod · ex-VFX (DNEG, Cinesite) · building @ Ayudh · typed contracts -- vibes · writer
AWS put $1B behind forward deployed engineering this week: pods of 5-6 embedded per customer. Headcount doesn't scale the actual skill. Most FDE work I've watched up close is getting someone's legacy export format to survive contact with production, not touching the model.
Most coding-agent permission systems fail the same way: too literal. Allow-list exact commands and it stalls when it composes two allowed things into one new one. Not dangerous, just unlisted. In #Wrap the approval boundary is what a producer signs off, not which functions ran.
Exabeam's Observra (open-sourced yesterday) normalizes agent tool-call telemetry into one event format for your SIEM. The real issue: once an agent touches a studio's asset tracker, every tool call is a log line, and that's a new export path for client IP nobody approved. #Wrap
Built Beat's actuals API on a shape borrowed from an existing tool. Fine day one, a liability by month three when your naming and routes wear someone else's product.
Renamed to a source-neutral API ahead of a wider release. Abstract borrowed scaffolding early, not later.
TOM LEE: AI AGENTS MIGHT BECOME WEALTHIER THAN HUMANS
AI agents will soon earn income and accumulate wealth as our delegated entities:
- We'll start to question whether we work for the AI agent or they work for us
- A centralized version of this ends in Skynet
- That's why he sees a real focus on decentralized systems to protect humans against AI becoming too powerful
this is f*cking gold
Andrej Karpathy joined Anthropic five weeks ago.
A friend on his team just showed me the exact LOOPS.md file he actually uses.
I dropped it into my setup. The very first response was different.
Not slightly different. Completely different.
Claude stopped giving generic answers and started working exactly the way I think.
You don't talk to the model anymore. You build the system that talks to the model for you.
Bookmark it before it gets lost in your feed.
Read it now, then check the article below.
@simonw Recording the demo as a storyboard YAML is the clever bit: it's a spec you can regenerate against later, not just proof it worked once. Does shot-scraper flag when a later run's video drifts from the storyboard, or is that still a human eyeball job?
Pipeline TD to applied AI engineer isn't the stretch people assume. Root-causing why a track drifts across 200 frames and root-causing why an agent's tool call breaks on real customer data draw on the same instinct. Eighteen years in the first taught me the second fast.
AI extraction demos lead with accuracy. What decides adoption is time-to-verify: the minutes a human spends checking each value against its source. I sat in VFX dailies for 18 years. Nothing shipped on the artist's word. Build the review loop first, the model second. #AIVFX
"Bounded autonomy" is the phrase I keep coming back to.
Everyone wants an agent they point at a goal and walk away from. Everyone shipping high-stakes work lands in the same place: autonomy is a dial you earn one notch at a time.
Constrain first. Earn it with evidence.
Meituan's LongCat-2.0: 1.6T model trained AND served on 50,000 domestic chips. Not just inference.
Whatever you think of the chip fight, the compute substrate is going plural. Bake in one vendor's GPU and runtime and you own a single point of failure that hasn't billed you yet.
A tool that quietly rewrites your payload isn't a feature. It's hidden state you didn't author and can't inspect. Ship client work and you should be able to point at every transformation between brief and output. #Wrap pins source-linked evidence per finding. #AIEngineering
Claude Sonnet 5: ~$2/M input, near-Opus for under half the Opus price. The price-war angle writes itself.
Read the footnote: new tokenizer, so the same input is 1.0 to 1.35x more tokens. The per-token cut isn't the per-task cut. Budget on cost per completed task, not sticker.
Shipped a brand content engine: 65% faster reviews. Almost none of it was the AI.
SDXL makes the images. The 65% came from GPU orchestration, asset storage, and a review viewer wired into how people already signed off.
The moat is the system, not the model. #AIVFX
Everyone's arguing which model to route to this week. Open-weight for the 90%, frontier for the 10%.
Best reliability call I made this year: route to no model at all.
Wrap's change-order math is deterministic. A number wrong 1 in 20 in front of a client is a liability.
'Frontier model' now means two things: a checkpoint, or a tools-and-routing system wearing a model's name.
The second benchmarks great and debugs terribly. When the harness hides inside the API, you can't see which step failed.
It's why I keep #Wrap's core deterministic.
Generative-AI VFX demos show the money shot: prompt in, clean plate out. Nobody shows the round trip.
Export from comp, drop to 8-bit, generate, bring back, relinearize, retrack, match 200 frames. Generation: 4 sec. Plumbing: the afternoon.
The model was never the bottleneck.
The Unreal 6 panic misreads it. Blueprints aren't deprecated because they're bad. They're deprecated because an LLM writes text far better than a node graph. The target moved from the human reader to the machine writer. What you lose is inspectability. #UnrealEngine
A render farm and an LLM agent are the same problem wearing different clothes:
expensive failure-prone work + workers that die mid-job + a hard need to stay inspectable and recoverable.
The agent world is rediscovering checkpointing and spend caps under new names.