Enterprises are doing AI eval wrong - and it's causing wasted iteration cycles and wrong decision making.
I’m preparing my spring semester course on designing large-scale AI systems and I need feedback:
What’s the one change you’d make to eval practice + reporting to make it reliable?
What's the error you see made often?
@karpathy We als need new programming abstractions. A powerful one is the notion of statistical assertions. Properties that you’ve weakly expect to be true over your flow but not all the times.
I keep seeing posts on AI and SaaS. My (biased) take is that AI agents need SaaS way more than humans do. And this is from experience.
Agents can consume knowledge like no human can. They can learn and iterate faster than humans. They can also go off track faster than humans—and at scale.
They need a platform that maximizes their potential while keeping controls.
@Lol19559014@JenniferSey That is not what he said. That is a redacted selection. It’s ok if you are trying to help him get more supporters, because this is what’s posts like this do.
@lugaricano@mustafasuleyman@Microsoft You are asking for the moon. Let’s set a lower goal, like automating many routine elderly care tasks with robots.
Anybody remembers the 2025 NY humanoid robots show ? Aside from dancing, I hope companies work on robotic care for older adults. I’d trust it more than nursing homes
At the 2025 Spring Festival Gala, 16 humanoid robots joined a traditional folk dance known for its sweeping steps and vibrant handkerchiefs.
During the grand opening performance, humanoid robots demonstrated the "Thomas 360" stunt move. In a subsequent stage comedy, a group of AI-powered humanoid robots, along with a bionic robot, took on comedic roles. And in a separate martial arts performance, robots executed a series of high-difficulty movements.