Small Harness v1.0.3 is now live on Github, more details in the tweet below.
This is the first of many moves to help move the industry away from using LOC and number of tokens, as a success metric.
Shipped a feature to @smallharness this week that I think can hopefully get us further away from measuring lines of codes, or tokens used, as a success metric for engineers.
Still ideating, but wanted to get my first idea out there. The goal is to judge the quality of a pr, and I'm doing this through a new command, /scorecard.
Here's how it works:
1. You work on a branch, successful turns accumulate under the current repo/branch (tokens, sessions)
2. You close the unit, usually via /ship pr, or manually with /scorecard close "My PR title"
3. Quality is scored at close from a local shipcheck snapshot:
- Ship readiness (ready / needs review / blocked)
- Whether tests passed, failed, or weren’t run
- Whether the PR was opened via gh (for /ship pr)
- Blockers and warnings from the ship flow
You get a grade, score out of 100, letter grade, and whether it counts as a quality PR (default: score ≥ 80, tests passed, PR command succeeded, not blocked).
Updated in 1.0.2 and 1.0.3, here's details on the new commands and what's new in each specific version.
And I don't think this is close to perfect yet, but I do think it gets at least me, mentally closer, to a way to track progress that isn't counting lines of code or tokens, which I think is so important we move far, far away from.
Pushing engineering teams to " write more code!" or telling them to "use more tokens!" is ridiculous, we can, and should do better.
Small Harness is a free, open source harness that plays nicely with both local and frontier models. Link to Github repo in first comment below.
@leyten@Zai_org That is awesome to see, but for me, 30 tok/s is still too slow. Of course, it's amazing to run this locally at that speed, but I do think it needs to be faster to truly compete with frontier models.
Also, isn't this a $60,000+ build? 😳