Using a new method of automated/human verification which uses multiple models to identify and highlight potential errors, we’ve reduced meaningful error rates on handwriting recognition to 0.33% WER and 0.23%. 18 months ago the gold standard was around 50x worse.
@nfergus@AnthropicAI I think an important related issue is the RSI makes the whole AI race much less predictable. State actors that are behind (or perceive themselves to be) might choose to act once RSI looks imminent—or they perceive it to be imminent.
Can I just say…it’s really cool but weird to be at 35k feet, find a bug in your website, spin up Claude Code to fix it, and then redeploy the site. Five years ago, most flights didn’t have wifi…
This isn’t really true…Roman building techniques survived in the Eastern Empire (Byzantium…think Hagia Sofia) and in Western Europe late Roman techniques fed into medieval/Gothic architecture. True that Roman hydraulic/marine concrete fell into disuse . What really changed after the fall of the Western Empire was a decline in centralized authority/capacity, a regression in the scale of civic projects, and less uniformity of approach. Not to be too pedantic…:)
Even using the strictest criteria, meaning we include capitalization and punctuation changes (which are often ambiguous in historical docs), WER falls to 3.5% and CER to 1.25%. For context, professional human experts guarantee a 1% CER.
Using a new method of automated/human verification which uses multiple models to identify and highlight potential errors, we’ve reduced meaningful error rates on handwriting recognition to 0.33% WER and 0.23%. 18 months ago the gold standard was around 50x worse.
Those that remain (33 in a 10,000 word test corpus) are generally spelling modernizations, slightly different abbreviations, etc. No hallucinations, just typos.
Lapland doing new math should make it impossible to sustain stochastic parrot arguments. Yet I don’t think we’ll see media/popular narratives start to shift anytime soon. It just means that the knowledge overhang is going to keep growing exponentially.
https://t.co/hXK0rlW0uk
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946.
For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids.
An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better.
This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
I had early access to DeepMind’s Antigravity Agent and used it to analyze a 150k record dataset. The result was a massive increase in the speed of research and a major productivity boost as I could test hypotheses in seconds. So much for stochastic parrots!
Gemini 3.5 flash is excellent on historical handwriting recognition: character error rate of 2.99% and 6.92% word error rate. When you exclude ambiguous capitalization and punctuation errors it scores 1.28% CER and 2.58% WER, better than Opus and just behind Gemini 3 Pro.
@tszzl And that is not what the majority of knowledge work is nor does it align well with how AI systems need to operate to do most knowledge work (ie information management, synthesis, translation, etc).
Excited to have our work on LLMs and historical handwriting recognition featured by @jackiesnow in IEEE Spectrum alongside the great @ylecun
https://t.co/JRATzEtDii
This makes it hard to ask LLMs about LLM capabilities because their self knowledge is dated but they’re also really self assured. They rarely search for the latest info. I wonder how this affects their ability to self-assess abd plan, consider f they have firm 2022-23 opinions on what LLMs can and cannot do.
Discussions about whether AI is going to replace people all too often turn on how we define “people”. White collar work will still need lots of people for some time to come, but that number will shrink gradually as tasks and later some jobs are automated. It’s not that AI will replace all people, but more and more people over time.
Your disagreement illustrates how siloed narratives are becoming . Outside of AI world, media/academia have been debating whether the Mythos delay was all hype because many assume it can’t actually do these things. Inside AI world, it’s assumed that it can so the debate is around whether it’s actually a safety issue or not (because models could already find exploits etc). Same conversation, entirely different assumptions and context. And this is going to start to pose real problems for communicating AI developments to a public which is still largely in denial (or just unaware) of what AI can do.
Tech layoffs are skyrocketing:
Tech companies announced 81,747 layoffs in Q1 2026, the highest quarterly total since at least Q1 2024.
Layoffs have more than DOUBLED from the previous quarter and have risen +580% since Q4 2025.
March alone saw 45,800 announced job cuts, the worst single month for tech layoffs in at least 2 years.
Tech layoffs are set to remain elevated with Meta's, $META, recent plans to cut ~8,000 employees.
Furthermore, Microsoft, $MSFT, is offering voluntary retirement to ~7% of its US workforce, which could transition into layoffs if participation is low.
This comes as tech giants shift spending toward AI chips and data centers, trimming staff to free up capital for infrastructure.
US tech employment is rapidly contracting.