The untrainable is the sexy name, the boring names : tacit knowledge, subjectivity, higher order thinking, (seems like even taste is out of the door now).
We love to make it sound so easy to eval and verify. This might be true in coding but the reality is never just that.
1. Think about those AI outreach emails
you've been getting. Grammarly correct, a lot of them use this "fake lowercase style so it doesn't look like ai". All of them are bad. I know when I say one but I cannot articulate a clear criteria on why.
How good are the top models at generating them? Same goes to AI comments. Because style is the hardest to eval or even articulate.
2. A lot of ai-consulting work is really helpful define or transfer that judgement on good, or good enough.
3. I recently talked to an ai software agency. Their entire pitch is on functional, matching spec. Nothing about it'll be built well.
I think we are still a year away from actually passing the benchmarks. Unknown what that looks like, but since it's all about the unarticulatable. It'll be a lot less visible.
1. I spent a lot of time at scale labeling data myself, never thought it was beneath me. Instead it's how we developed quality criteria, instructions, how we provide partnerships to our customers. My co-founder @flubtitle and I built out new labeling products for LLMs in 2023 (well it's old now) because we did labeling, ran queues (meaning we were running real projects and needed to deliver data). Not because we got a prd from anyone.
2. One of the first things we built at Santori Labs is our voice-first eval/label flow and a roleplay system. I spent hours every week going through data, thinking about what is good vs not.
3. Imagine an engineer who thinks they are too good to do that, but instead they are just here to execute a prd that is given to them.
4. I don't think data is all you should do, but it's still one of the most important things you can do. Labeling is one form, another one is looking at agent traces. If you don't see why that's important, you are stuck in the past.
5. It's painful looking at data. You think you just look at it and you just know if this is good. It's never that. It's always the messy middle of "meh". That's why the design principle for our own data flow is that: data is a focused act, and the product needs to encourage focus
Just learned:
Software engineers used to do manual data labeling at Scale AI while Alex Wang was CEO. After he left, new leadership joined, and were HORRIFIED to learn this. Stopped it ASAP
Now at Meta, software engineers are assigned manual data labeling... see the pattern?
@aiechrl@pangram I used it to push back on my thinking, challenge from different perspectives. I found that quite good.
I know I slapped that question over but I actually agree. It was just something to think through.
"I'm not against AI, I'm against the easy"
@alexanderbenz Ya, I had a friend who set up an agent to wake up every half an hour to make sure the job didn't crash or error. That's legit but very few of them were actually like that
@aiechrl@pangram What's the score for, say, one of David Foster Wallace's essays? Or if I decide to manually insert em-dashes
I used to work on llm data, I know how human preferences work
@RyanMWexler The word "slop" makes me head spin.
Also it's sad that some of the styles are turning into slop not because they are, but because they have become the AI standard :rip