I joined @Nominal_io almost a year ago and have been quietly helping accelerate the company internally with AI.
Now it’s a team-wide effort.
We’re looking for people who want to go deep across every function of the company and use AI to make the whole org move faster.
https://t.co/M5xt7jZa4o
DMs open
@tylerxdev Glad to have you in NYC!
If you're ever down to hangout for pizza on Thursdays (Or sushi on the last Thursday of each month) let me know! I work a few streets from the Ramp office.
Nominal builds the data supply chain for hardware engineering: cars, rockets, energy systems.
Their builds on Depot:
Windows linking: 8 min → 30 sec
Docker builds: 12 min → 4 min
Merge to live: ~10 min
How @Nominal_io did it: https://t.co/7ff4jRjLPY
spotty internet but asked Fable 5 to create a wind particle simulation to find the optimal fan placement for an adobe-built house im at in a far far ranch.
used a few images of the room, a simple hand drawing, and some constraints.
i’ll be sleeping cool tonight 🙏
Man goes to doctor. Says he's depressed about AI. He fears the permanent underclass.
Doctor says, "Treatment is simple. Read Gary Marcus. LLMs are stochastic parrots��they can't reason out of distribution."
Man bursts into tears. "But doctor..." he says, "I am in distribution!"
Strong Opinions, Loosely Held on Agent + Harness Engineering:
1. You can outperform any default harness+model (including codex & claude code) on pretty much any Task by engineering the harness around it. Using the exact same model, curate prompts, tools, skills, hooks for that Task. This harness optimization process is becoming much more agent driven with humans reviewing and curating evals/rewards to hill climb on. “Just say what you want”.
2. A “general purpose” agent/harness doesn’t really exist, it’s a tradeoff between time spent on customizing the agent and performance (cost, latency, accuracy) on a Task. I don’t exactly follow what a general purpose means tbh. Who decides what’s general and what’s not?
3. But if the “general purpose” agent/harness existed, it would look like a good coding agent
4. Building a Task specific harness will most likely converge to good prompt & tool design (probably packaged up as a Skill) as models become smarter and better at in-context learning
5. Evals are a moat and thus data to produce evals is a moat. Especially true for vertical agent companies. This is because agents can fit to most Eval sets today. If Evals measurably encode all the good behavior your agent needs to do, then this signal can be hill climbed to improve your agent
6. Frontier closed models are far too expensive for the large majority of tasks the world needs to do. As teams start mapping costs to ROI, Open Model Harness Engineering will take off even more. It is almost always worth the investment to at least try to get a potential 20x+ cost reduction
7. A large chunk of design decisions around Task decomposition and context engineering exist solely because our usable context window is 50-100k. Agents that become excellent at breaking down tasks, applying compaction appropriately, and orchestrating subagents as sub-task workers will be the most delightful products to do real work.
8. We’re entering an Age of Unbundled (& Rebundled) Agents where Subagents exposed as Tools do a ton of domain specific work on behalf of an orchestrator agent. The Harness becomes a box that gets populated with the exact set of tools, skills, and subagents needed to solve that task or sub-task.
Examples include WarpGrep (search), Chroma Context-1 (search), Nemotron 3 Omni (small multimodal), etc. Bespoke agents that rock at narrow tasks orchestrated as tools.
This also applies to software as tools that are used by agents via Skills like Remotion or Blender. Different harnesses bundle together the tooling needed to complete that narrow task.
End of opinions, these may change by the time this tweet goes out or may double down and expand on these in an article