Howdy, Twitter.
I made this account for a mix of reasons.
1. I'm cooking cool stuff and want to share it. I hear that building in public is what the cool kids do.
2. I've got a CS background, but my day job is as a creative professional and the old school creative vs tech bro discourse on generative AI is tiring and completely lacking in nuance on both sides. I hope to provide a pragmatic perspective that will upset both sides equally.
3. I want to meet people doing difficult things!
This starts with sharing progress on a couple things I'm building: and anti-sycophantic agent to help you find your personal edge in the world and a tool for tracking the probability of any event you care about occuring, without the polymarket risks.
Follow and say hi if that sounds interesting π
How impressed you are by frontier LLM's ability to act, plan, decide or reason is inversely correlated to how much you know about the domain in question.
Was making the exact same argument to someone earlier.
The act of coding always extrapolates from the intent. Unless your spec is literally one to one to the detail level of the resulting code (which as you put means it IS the code, or may as well be), you're trusting the agent to make decisions.
That is fine if you go into it knowingly, but you can't spec you're way to a full proof LLM implementation.
Great summary.
There are some issues with automated harness design, however.
In most use cases, harnesses are super foundational and the cost of bad outcomes propagate to all of your agentic work within that harness thereafter.
Unless your use case is mappable to a really clear reward function and your benchmarks, tests etc can capture a wide, but highly relevant, range of tasks, it's very difficult to get the speed and iteration advantage that you talk about.
The problem then collapses into the quality and variety of benches you can build - which is arguably a harder task for most domains.
I'm building a harness focused on helping people find and execute strategies for their personal development and success, based on their own skills and situation. You can measure outcomes, so instead you need to apply human judgement.
So automated harnesses can make sense, but the domain being super narrow and measurable is key imo
@TheGeorgePu This is like when I vibe code something poorly, get stuck in a web of cut corners and start again from scratch a week later.
Minus the trillion dollars.
@ItsKieranDrew where do you read your fav writers? what platforms do you think work best for growing an audience with actually authentic and nuanced long form writing?
@icanvardar this is very true, I feel like the more new pipelines and frameworks I see pop up, the more distant we get from the 'just chat about it' that made it compelling in the first place for newcomers... perhaps not such a bad thing
@xDaily Gonna be a hell of a future when being employed is basically a continuous case of proving beyond doubt that you provide more value than you cost every single day.
As companies become perfectly optimised they the avg length of a given employment will plummet
@boardyai An AI agent that knows when to tell you you're wrong.
A custom harness with tools entirely designed to help you find your edge in the new world, based on your skills, resources, ideas and circumstances.
@DanielSmidstrup My timeline is 99% of people talking about the 'AI unlock' and all the possibilities.
But virtually nobody is pointing at anything actually demonstrating any of that...
'Building in public' is misunderstood.
Useful for feedback when it arrives, sure, but it has very little to do with marketing or distribution unless other builders are your market.
If you're selling software to dentists, showing your agent skill framework off on Twitter won't help.
Do it for community and fun. It will do absolutely nothing to sell your software.