If true, interesting to see what the US admin does now. By current "standards" every model from now on (which will certainly be better than Mythos) can be banned because "it's dangerous".
A new, more capable version of Mythos has emerged from training. I don't know whether it will be called Mythos 5.1 or Mythos 6, or if Anthropic will keep it internal to accelerate further development - but it has arrived.
Stopping models like Fable 5 or Mythos 5 from being served to the public does nothing to slow down development. In fact, it probably speeds it up slightly by freeing up resources. There are also no rules preventing the labs from continuing to advance capabilities while any current model is under embargo - or from keeping progress quiet until they choose to release it. None of them can afford to pause or slow down. We need only look at how capable GLM-5.2 is as proof of this. To protect their business models, the frontier labs must continually train increasingly capable systems to stay ahead of open source, and each other. The current continues to rage beneath the ice, and we continue to race toward our destination.
@kimmonismus Anthropic's guardrails were ridiculous already, then the US government made it worse. We're going straight to a split where some will have access to higher intelligence and some will not.
Opus 4.8 first impression after 4 hours of use inside a complex project:
- def better than 4.7
- much better in conversations
- clearer/deeper reasoning
- reads intent better
- improved 'read between lines'
- good thinking partner
- not bamboozled by the long context
Overall, a very positive first impression. Since we reasoned a very good plan together, I'll let it implement in the repo and come back with results later.
From me, a thumbs up @AnthropicAI
Opus 4.8 first impression after 4 hours of use inside a complex project:
- def better than 4.7
- much better in conversations
- clearer/deeper reasoning
- reads intent better
- improved 'read between lines'
- good thinking partner
- not bamboozled by the long context
Overall, a very positive first impression. Since we reasoned a very good plan together, I'll let it implement in the repo and come back with results later.
From me, a thumbs up @AnthropicAI
It’s time to demystify Mythos.
Mythos is not magic. It’s not a doomsday device. It’s the first of many models that can automate cyber tasks (just like coding).
OpenAI’s GPT-5.5-cyber can now do the same. And all the frontier models (including those from China) will be there within approximately 6 months.
It’s important to recognize that these models do not create vulnerabilities; they discover them. The bugs are already in the code. Using AI to discover and patch them will actually harden these systems.
The leap from pre-AI cyber to post-AI cyber means that there will be a big upgrade cycle. After that, however, the market is likely to reach a new equilibrium between AI-powered cyber-offense and AI-powered cyber-defense.
Obviously it’s important that cyber defenders get access before cyber attackers. That process is already underway but needs to happen quickly (see point above about Chinese models).
Unlike Mythos, GPT-5.5-cyber appears not to be token constrained so it may be the first cyber model that defenders actually get to use.
LOL @OpenAI here is a mini-bug: in a relaxed conversation with 5.5 I asked about the new img gen 2, like how the switch happens from conversation to generating, and it immediately started generating an image without answering.
Feels like a great improvement over its predecessor, not so uptight in conversations. First impression is very good, but we'll know in a day or two if this "personality" is also great at complex work.
Introducing GPT-5.5
A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.
Now available in ChatGPT and Codex.
Compared Opus 4.6 to 4.7 on a full end-to-end complex run. Then had 4.7 compare the outputs: "... honest answer: the old version is better than mine on more axes than it loses".
Tested some more, and I don't get a glimpse of that 4.5 to 4.6 OMG-moment. Working theory so far is the Adaptive Thinking that Anthropic set for this model compared to 4.6 Extended Thinking. This because the tested tasks were reasoning intensive, and reasoning is where 4.7 was weaker.
I'll do intense coding today and am curious how it behaves with CC, but I hope Anthropic lets us decide the thinking and not put in a variable they can control based on the availability of their compute.
Compared Opus 4.6 to 4.7 on a full end-to-end complex run. Then had 4.7 compare the outputs: "... honest answer: the old version is better than mine on more axes than it loses".
Tested some more, and I don't get a glimpse of that 4.5 to 4.6 OMG-moment. Working theory so far is the Adaptive Thinking that Anthropic set for this model compared to 4.6 Extended Thinking. This because the tested tasks were reasoning intensive, and reasoning is where 4.7 was weaker.
I'll do intense coding today and am curious how it behaves with CC, but I hope Anthropic lets us decide the thinking and not put in a variable they can control based on the availability of their compute.
@karpathy Sounds great. For the "catalog" I opted for vector, not md - just references, what this document is about, which project, key topics, plus a reference to where the full file lives. All automated, MCP, accessible to any model I work with.
LLM Knowledge Bases
Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So:
Data ingest:
I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them.
IDE:
I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides).
Q&A:
Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale.
Output:
Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base.
Linting:
I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into.
Extra tools:
I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries.
Further explorations:
As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows.
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.