Writing performance varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high.
- Did writing get worse with higher effort levels? Overthinking?
On our toughest benchmark Opus 4.8 scores a 63… Opus 4.8 scored a 79.6 on our writing benchmark
- Do you show benchmarks on every .to? would be cool to see results centralized at every .to/benchmarks or something
Thinking of AI as a productivity booster for prior workflows is the wrong framing. Like all of the previous waves of computerization/softwarization, AI is a tool that lets you do new things in new ways.
@mikelikesdesign wow, that was quick! looks nice!
other ideas
1. cursor switches from blinking accent to solid red, subtle feedback of mode shift
2. swipe up / down to delete entire lines of text, swipe left / right to delete text within a line
Adopting Claude speak in my regular life, episode 1:
Partner: Did you do the dishes tonight?
Me: Yes they're done.
Partner: Why are they still dirty?
Me: You're right to push back. I didn't actually do them.
My husband recently left a tech darling of today to go into stealth as a designer founder. Few know he’s my husband bc we have diff last names. Fascinating to see which VCs caught it first and are hustling in his DMs. He’s not taking calls but keeping a list of these. I see you 😉
I will argue that the culprit is timing and substrate instability more than design or anything else.
We have simply not had successful consumer AI companies (besides OpenAI and maybe Claude - not Claude Code).
- Inference cost
- Jagged intelligence
- Rapidly changing frontier
- Free ChatGPT good enough for most normies
Intelligence is a new medium. Even there memory, personality, persistence, agent agency, long term planning are not really solved.
I have yet to see an agent harness/scaffolding that survives 6 months. Every new model makes half the bells and whistles unnecessary.
New model capabilities gave a shelf life of a few months before Open Source catches up. You can’t design great consumer UX on a platform that’s still being invented underneath you.
The tech has not settled yet for masses. Even early adopters can’t keep up. Things that have won are those that we had priors for - Chatbots, Coding.
Also don’t think designing for pixels is dead. We are very visual creatures. The shape of what we call “UI” needs to evolve along with a few computing constructs.
The shape of the container of intelligence will be varied and hyper personal.
We are in very very early innings. This is not the iPhone moment. This is pre-Macintosh moment.
For the love of god, let your designers ship to prod. There’s no reason to be so precious about your codebase when we all know claude is writing it anyways.