if you want to improve how you use AI or your agentic systems, the one piece of advice i'd give you is this:
invest in harness engineering.
it matters more than which model you pick.
i am constantly approached by people saying claude code is "dumb" or "verbose" or "loses context every session." then they spend one weekend doing harness work:
- a scoped CLAUDE.md
- skills for the workflows they repeat
- a reviewer subagent that runs against every diff
- deterministic checks (linter, type checker, structural tests) wired to fire in-session via hooks.
the same model, suddenly ships clean. just clean steering, clean feedback, the agent doing the thing it was supposed to do.
harness engineering is the name of this practice.
the model generates code.
the inner harness (claude code, cursor) runs the loop.
the outer harness, yours, is where you have actual leverage.
two halves:
1. guides (CLAUDE.md, skills, rules, push channel)
2. sensors (linters, type checks, llm-as-judge, deterministic feedback the agent runs against itself).
most people stop at guides. that's why results plateau. once a rule is precise, make it deterministic. an encoded check beats another paragraph of markdown.
if you tried claude code once or twice and gave up because it felt half-baked, the issue may not have been the model. it might have been the empty outer harness wrapping it.
harness engineering is the one practice i point every claude code or codex user at. especially anyone who almost gave up on it.
who else has seen their workflow completely level up after building a proper outer harness?
drop your before/after below 👇
Underrated, but setting up a skill just to prepare an import from one computer to another to migrate my Claude configuration is so helpful!
It basically reads a repo and finds all skills and memory both globally and project based, that is used by the repo and exports them either to the cloud or to your external storage
@RKronen The main area of focus is context engineering, what are the pieces of information vital for the agent and what isn’t! I think frontier models more or less understand what you want them to do from most prompts. It’s all about how you load the context with skills etc
@packdir This is basically Karpathy’s thesis as well right, speciation? The future won’t necessary be a one size fits all harness or model, it would be specialised harnesses that do a set of tasks very well
@plainionist Hooking these up to Grafana to view them visually is a great shout as well! You want to be able to view these metrics quickly, easily, and frequently
Yeah and I actually think finding ways to just making things easier on your brain is really the way to go. With ai, there’s no “fixed” process, you can build whatever you’d like to view information, it could be visual, auditory, whatever works for you! That’s why I think the html instead of markup idea took off so well
@nmamizerov This sounds super far fetched but I really would love to just be able to invest in crazy ideas and really try to make a difference and only look at profit as a way to cover operations and grow, not be in it for personal financial gain
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
I feel like a big part of that product development process is actually understanding your market. Finding where your ideal user lives and investigating that space for ideas is one of the best way to polish the product! And get new feature ideas!
No point solving a problem no one has
I mean this is the struggle isn’t it, I feel like junior devs right now might have it tough, as the new standards haven’t been discovered, the education system hasn’t caught up yet. I could say they should focus on jumping right into system design via repos designed to teach (purposeful breaks) but that’s more of a bandaid than a solution
@mattpocockuk I feel like I might run into issues trying to undo some of its recommendations that I miss to catch. Then I lose the alignment aspect which is the best part of grill me right?