This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
over the weekend i had another obvious thing to check, namely whether claude autonomously resolves the famed sum-product conjecture over the reals. answer: yes
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
@AndrewYNg I'm moving on to pursue new opportunities. LandingAI is in a great spot and ADE has real momentum. Excited to watch what the team does next.
I put together some learnings about how I've approached machine learning over the last 10 years:
https://t.co/ezBHcFD2rS
Almost 10 years ago, I asked @AndrewYNg whether I should do a PhD or join a startup. He told me to join a startup, and then asked if I wanted to join his. That startup was LandingAI.
@AndrewYNg As a founding engineer at LandingAI, I got to co-create data-centric AI with Andrew. Incredibly grateful for his mentorship, and for Dan Maloney's, our CEO, who has been a great mentor too.
A few other top contenders: "The Information" by James Gleick which covers information theory and "Quantum" by Manjit Kumar which follows history of quantum mechanics. You can view my other book reviews on my website https://t.co/yvSABfBttx
I started putting together a list of short reviews for books I have read recently. My favorite book has to have been "A Brief History of Intelligence" by Max Bennett. I would recommend this book for anyone interested in AI and intelligence in general.
I think there's still so much more we can learn about how intelligence works from existing organisms, including us, and this book does a great job of summarizing that knowledge in an accessible way.
I always thought the decline in fundamental AI research funding would happen because AI didn’t generate enough value to be worth the cost.
But it seems like it’s happening because it generated too much value. And the race to capture that value is taking priority.
Just remembering that a lot of this started in curiosity driven industry research labs.
We’ve introduced a new slimmed-down version of VisionAgent, designed to be faster, more reliable, and easier to use.
With this release, we’ve streamlined the agentic workflow to focus on the most effective tools and workflow.
In this new blog we introduce a new version of VisionAgent -- a modular, agentic AI framework that breaks visual reasoning problems into subtasks, chooses the right vision tools, and applies visual design patterns to solve them.
Check it out here: https://t.co/zLsnA2kPN7
A lot of VLMs like GPT-4o and Claude-3.5 are great with text but still struggle with vision. We tested them on a simple puzzle -- count the missing soda cans in a box of soda cans and they all struggle to answer correctly.