We’re introducing a new model benchmark.
And it’s a different kind of benchmark. (Basemark? Vibench?)
A different kind because it’s breathing, constantly updated from millions of builders. Not a closed set of tasks.
For a while now the public benchmark have not been really useful. Many models scoring high on benchmarks with very low real world usability
So we’re introducing to the world a new benchmark that we’re using internally and found extremely useful.
Our benchmark is basically how satisfied millions of users are when using different models.
IMO it’s the closest measurement to how useful a model is in real world use cases.
This metric is also correlated with our own business metrics - conversion, retention, etc.
We called it the frustration meter.
It’s automatically analysing millions of messages daily
It detects bug loops, repeated requests, etc.
We use this to benchmark every model we consider shipping. Not by asking "did it generate correct code." By asking "how did the builder feel after using it."
it’s a good benchmark to measure model degradation. So far in the past few weeks we haven’t found any.
Here's where the top models stand right now, ranked by average frustration score (scale 1 to 5, lower is better):
opus 4.6 - 1.3
sonnet 4.6 - 1.4
opus 4.7 - 1.5
gpt 5.5 - 1.5
gpt 5.4 - 1.6
Gemini 3.1 - 2.2
For app building, Opus 4.6 seems better than 4.7 to a lot of builders. We ran Opus 4.7 50/50 against Opus 4.6 across over 10,000 apps. Frustration riseed by 43%. Turns per request by 19%.
Gemini 3.1 don’t perform well at the moment, I left out of the graph as it made it unclear due to it’s rapid changes in this benchmark.
Quick note - this is all aggregated data, and do not involve reading individual or identifiable conversations.
We’ll keep tracking it and I’ll share it from time to time.
A lot of posts this week claiming small businesses are “garbage.” So I ran some numbers after owning my HVAC business for 7 months.
I bought small—$1.95M revenue, $343k SDE—in one of the most competitive industries and markets (HVAC in Central Florida). Translation: I bought a “job.” Oh no… I have to work hard?! No beach checks for me.
But I got into a highly sought-after market at 3.2x. Purchase price: $1.1M.
Fast forward 7 months:
Bottom line up 164% (forget the J curve).
Profitable every month since acquisition.
Without adjusting pre-acquisition numbers (which would only go UP), 2025 SDE is tracking to $568k on $2.37M revenue.
At the same 3.2x multiple? EV = $1.82M → +$720k EV growth.
I own 85%. You can do that math.
Now let’s pro forma it:
Revenue is up 34% since acquisition at a healthy 17% net margin. Apply that growth to the first 5 months and you get $2.63M revenue and $697k SDE.
At a modest bump to a 4x multiple:
EV = $2.78M → $1.68M in EV growth in 7 months.
Do the math again if you want. It’s fun.
I’m NOT saying buying small is for everyone. You’d better be ready to make it your life’s work, get your hands dirty, business and implement process, drive accountability, attract talent, inspire people, and lead from the front. This is a people game. 75% of the original team has been replaced. If you want to buy a home services business of this size, you MUST be hands-on.
What I am saying: there’s nothing wrong with buying small. The idea that a small business “isn’t a real business” assumes the only real businesses are the ones where the owner isn’t meaningfully involved.
That’s a lie people tell themselves to avoid the work.
Call it a "job" if you want to... We all work. I'd rather own my job.
@AizikZimerman The real edge isn’t age by itself. It’s the willingness to use that runway to outwork, out-learn, and out-innovate everyone who’s already set in their ways. That’s where the upside really is. Automate everything but the wrench!
@SMBController@AizikZimerman You’re right. Most home and building service owners resist change until the pain becomes unbearable. The few who lean into innovation early end up owning their market while everyone else plays catch-up. Credit to you for being on the right side of that divide.
When the federal government shuts down, the impact doesn’t stop in Washington, it hits the trades first.
“Every job is getting delayed, or they’re not getting paid on time. It creates a cash problem, you can’t buy equipment, you can’t pay your employees.”
@didiazaria , CEO @Workiz Service pros deserve stability. That’s what we’re building for.
🔗 https://t.co/UMJcfea030
#Trades #SmallBusiness #Workiz #Leadership #WSJ
The right prompt turns AI into the ultimate sidekick.
And when HVAC techs use it well, they don’t just fix the system they save the day.
which of these 10 prompts would you try first in the field?
The difference between a good HVAC tech and a hero HVAC tech?
It’s not just knowing how to fix the unit. It’s knowing how to communicate, solve problems faster, and leave the customer thinking: “Wow, that was amazing.”
That’s where GPT comes in.
https://t.co/hBvYKmrLQR sourcing
🛠️ Prompt: “List possible replacement options for a [specific part]. Suggest compatible alternatives if the OEM part is unavailable.”