I may have gotten @OpenAI’s ChatGPT-5.5 Pro to produce a candidate counterexample to an open problem from Don Knuth's The Art of Computer Programming.
Exercise 210 in Volume 4, Fascicle 8A (https://t.co/8NhO16kJIC, page 55) asks a question about generating-function denominators for Hamiltonian cycles and paths in knight tours on m x n chessboards. It’s rated HM46, in the range of 46-50 that Don Knuth uses for open problems.
The model produced a computer-assisted counterexample for m = 5. I then used Codex with GPT-5.5 on Extra High settings and @AnthropicAI's Claude Code with Opus 4.8 on Ultracode (on a 20x Claude Pro plan) to independently rebuild and verify the supplied programs and certificates. So far, the checks reproduce the findings and haven’t found any mistakes. That said, this absolutely needs human expert review.
In the interest of transparency, I’m sharing the conversation, writeup, and code below. I’d be grateful if any mathematicians with expertise in Hamiltonian paths, transfer matrices, or knight-tour enumeration could verify or refute the claim. Feel free to contact me via X or via email at [email protected].
I'm also incredibly grateful to Liam Price (@Liam06972452, solver of Erdős Problem 1196) for sharing his LLM-assisted math workflow. Regardless of the final outcome, this has been such a fun and intellectually stimulating experiment!
Links
conversation: https://t.co/JZH47QNDd0
writeup: https://t.co/wNEfIdNRtS
repo: https://t.co/FRjdWcYzUn
P.S. if anyone knows the appropriate way to send a concise note/package to Don Knuth or someone working on TAOCP Fascicle 8A, I’d appreciate guidance, since I know Don doesn't typically respond to emails and prefers correspondences via physical mail.
This is a new paradigm for interacting with Claude that is significantly more "inline" with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this "just work" (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way - you can talk to it as you would talk to a person and it can help with a very large variety of workloads.
Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.
We taught a brand-new mini-series this year at @SCSatCMU on Modern GPU Programming for ML Systems, as part of the ML Systems course, touching on fun questions like what data layout swizzling is, how to use 3D TMA, and state-of-the-art Blackwell programming. We released a curated online book based on the materials: https://t.co/5ZJg2lySNO check it out
We pulled in $117,000 in Chrome bug bounties with simple tricks; on Wednesday, Quang Luong will spill his secrets at the Stanford AI Security Conference:
https://t.co/Fhq0NH13jn
Fun fact: Quang is probably the only researcher in the known universe who still uses Gemini to find bugs.
Before the end of the year, Calif researchers will be presenting at Blackhat USA, Defcon, and Hexacon. We're also hoping to make it to Unprompted AU, OffensiveCon, and Objective By The Sea.
At Black Hat USA, Dionysus Blazakis and the team will walk through the bugs and exploit chain used in the Apple MIE bypass discovered a few months ago.
https://t.co/dfeYJSzFVT
At DEF CON, we will tell the story of hacking software that helps run the Internet backbone.
At Hexacon in Paris, @brucedang and I will give the keynote. Apple announced MIE there last year, so it'll be a fun one. I suspect they only wanted Bruce, but keynotes require a certain amount of professional nonsense, and Bruce is far too honest for that, so I got invited too. My job is marketing, which is to lie without getting caught.
What's wild is that none of this existed at the beginning of the year.
We started with a simple realization: very few people have both deep security expertise and access to the best AI models.
So we went all in and never looked back.
Back in March, we called a company-wide all-hands on a Saturday. The title of the invite was: "AI Tsunami and Our Actions."
I don't want to romanticize overwork, but what we were seeing felt too urgent to wait until Monday.
Then everyone started cooking. The results have been spectacular.
Our research on defeating Apple MIE made it into The Wall Street Journal. We signed major contracts with Anthropic, OpenAI, Google DeepMind, and xAI. While others are celebrating access to the latest models, we've been using them to explore the frontiers of vulnerability research.
In the first half of 2026, we're already surpassing our entire 2025 bookings. Most importantly, we've assembled a top-tier team in record time.
I've read many strategy books, but this is the first time I've witnessed the power of the right strategy at the right time.
Focus is the name of the game. Strategy is deciding what to ignore. For one month and a half, we stopped starting new projects. I've personally shelved a lifelong passion in Vietnam, because it isn't a priority for the company. You can only move fast when you're light.
Several people were upset when we changed direction so abruptly. That's normal. If nobody complains, you probably didn't focus.
Of course, strategy isn't magic. You can make a focused bet and still be wrong. We were fortunate that this one worked out.
None of this would be possible without our partners and supporters across the frontier labs. Thank you.
Claude Code can now run an entire PhD-level research pipeline by itself.
it runs a 10-stage workflow from blank page to publication-ready PDF, replacing the work of a PhD advisor, three peer reviewers, and a copy editor in one repo.
→ Deep research with 13 agents (PRISMA + systematic review)
→ 12 agents write the paper section by section
→ 5-person peer review (Editor + 3 Reviewers + Devil's Advocate)
→ Integrity agent catches fabricated citations + stat errors
→ Final output: LaTeX → PDF, ready to submit
After the paper is finalized, it runs a Collaboration Quality Evaluation that scores YOU, across 6 dimensions, 1–100. Direction setting, intellectual contribution, quality gatekeeping.
It tells you exactly where you were the bottleneck.
Drop it into .claude/skills/ and the whole pipeline auto-loads. Works in Claude Code, Cowork, and as a Claude Project.
100% open source. CC-BY-NC 4.0.
🚨 Mini Shai-Hulud/Miasma has now spread to PyPI.
Socket found 37 malicious artifacts across 19 PyPI packages.
The packages abuse #Python .pth startup behavior to launch a Bun-powered credential stealer targeting developer, cloud, and CI/CD secrets.
https://t.co/tYhmMqvjyw
🚨 Des cybercriminels vendent un générateur de fausses cartes d'identité françaises pour 800 €
La suite logique des fuites de données personnelles est aujourd'hui sous nos yeux : la création de fausses identités toujours plus crédibles.
📂Noms, prénoms, dates et lieux de naissance, adresses, numéros de téléphone, adresses e-mail, parfois même photos d'identité... Des millions de données circulent déjà entre les mains de cybercriminels à la suite de nombreuses compromissions.
Les démonstrations publiées montrent un logiciel permettant de générer et personnaliser des cartes nationales d'identité françaises à partir de photos et d'informations personnelles, tout en affichant un prétendu contrôle de conformité.
Avec des données réelles issues de fuites et ce type d'outil, des fraudeurs peuvent créer des documents particulièrement crédibles pour :
👉usurper une identité ;
👉ouvrir des comptes bancaires ;
👉contourner des procédures KYC ;
👉souscrire des crédits ou services frauduleux ;
👉réaliser des escroqueries administratives ou financières ;
👉créer des comptes sous de fausses identités.
Cette combinaison entre fuites massives de données et outils de falsification toujours plus sophistiqués représente aujourd'hui une menace majeure pour les particuliers, les entreprises et les administrations.