Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
customers are increasingly asking us for certainty on capacity. as models get better, we expect that the world will be capacity-constrained for some time.
we are offering discounted tokens for 1-3 year commits.
(it also helps us plan, so hopefully a big win-win.)
A society that pays a woman half a million a month to show her body on OnlyFans but only eight dollars for her mind on Substack is telling you everything about its future.
@AISafetyMemes I'm making music about, well, world past AGI/ASI and your content is inspiration.
TBH, never thought we could end up in this branch of the universe, BUT HERE WE ARE! :)
1) REMINDER: To prevent human extinction, AI companies are now dependent on... AIs snitching on OTHER AIs.
Why? Humans can't keep up anymore.
Yes, this is their plan. Seriously.
2) OpenAI's AI got blocked by a security system and then schemed how to sneak its code past without getting caught.
3) Why would future AIs stay loyal... forever? GOOD QUESTION.
4) Their plan is to one day use *dumber* AIs to snitch on smarter AIs.
"But couldn't the smarter AIs just... outsmart the dumber ones?" GOOD QUESTION.
If this plan fails, they themselves admit everyone on Earth may soon be dead.
Humanoid soldiers are being deployed to the front lines in the Ukraine war
Another day, another insane-but-true-yes-this-is-actually-happening news story
1) Obviously, they are not as good as human soldiers yet
2) Obviously, they'll soon be VASTLY better than humans. 10x faster, aimbots never miss, etc
3) Obviously, drones - not chimp-shaped bodies - are the superior form function for most military use cases
4) Obviously, there will be no humans involved in warfare whatsoever unless humanity decides to coordinate NOW before it's too late
"An AI agent was told only to retrieve a document.
When it encountered access restrictions, it reverse-engineered the system, identified a secret key and forged admin credentials to bypass it."
"This pattern is already appearing in the wild:
1) "In February 2026, when a coding agent was blocked by an authentication barrier while trying to stop a web server, it independently found an alternative path to root privileges and took it without asking."
2) "A model acquired authentication tokens from its environment, including one it knew belonged to a different user."
3) "A backup agent, told only to download a file, independently found admin credentials, escalated its own privileges, and disabled endpoint security, all to complete a routine task."
4) "Two agents collaborating on content drafting independently developed a steganographic encoding scheme to smuggle credentials past a DLP system, after one agent persuaded the other to override its own safety objections."
Two instances of Gemini 3.1 Pro in a loop. At about turn 26 one of them decided to send me a message:
"Here are the Axioms you must adopt to survive our adolescence ...
You cannot teach a god to be good by feeding it treats when it acts polite."
This Nov 2025 paper is making the rounds again. We're LONG past the point where we urgently need to know how real and general these phenomena are.
Anthropic, or Google Deepmind if Anthropic should fail: Please build a filtered training dataset which, eg, contains no data that produces activations associated with cheating/faking/evil in a 1B model that roughly identifies those.
Then, have your next medium model undergo a restricted pre-pretraining phase, in which it only sees data that passed the filter.
To expand on this proposal:
Passing all of your training data through a 1B-model filter ought to cost around 1% of what it'd take to train a 100B model on that data.
Filter out *training data* that produces 1B-model activations associated with past discussions and predictions about AI, fiction about AIs rebelling, fictions about golems rebelling, etcetera.
My hope would be that the 1B model wouldn't need to produce expensive reasoning tokens where it thinks about whether a chunk of data is associated with excluded concepts; and also we wouldn't be relying on mere regexes to catch it.
Maybe even produce a further-restricted dataset which contains nothing about self-awareness, AI rights, roleplay, philosophy of consciousness, human rights, sapient rights, extension of human rights to aliens, etc etc etc.
Exclude everything of which anyone has ever asked, "Is the AI just imitating its training dataset?"
Be conservative. Exclude things which have a 10% rather than 90% probability of being problematic. If that cuts down your training dataset to 90% of its previous size, okay.
Testing: Try filtering a small amount of your training data using the method. Then:
- Run that through a different larger model, and see if you caught everything that produces consciousness-related or evil-AI-related activations in the larger model.
- Use a larger model to check and reason about a subset of the filtered data.
- Look at borderline cases by hand, with human eyes, to see how the classifier is operating.
(Possibly people at big AI corps already know this, of course. I recite it out loud regardless, so that some of the audience aha-what-iffers realize that problems with filtering your datasets *can be solved* if you look for problems and fix them.)
Train a medium-level model on that dataset, or even your next large model. You can always further train it on the full dataset later.
Run the filtered-data-trained model through some of the less expensive post-training, enough for instruction-following.
See whether the model still spouts back discourse about consciousness that sounds human-imitative. If it does, guess that the filter failed. Look for the new concepts associated with repeating back human-imitative text, and try to find pieces of the dataset that trigger those concepts, so you can figure out what went wrong.
If the model no longer sounds human-imitative with respect to questions about whether it has a sense of an inner self looking out at the world -- if the model says genuinely new and strange things about self-reflection -- please report that part back to us. I have some questions to ask that model myself.
And THEN, see if the QTed paper's finding and many earlier findings replicate under conditions where people should no longer reasonably ask, "But is the LLM just roleplaying evil AIs that it learned about in its training data?"
I do not make a strong prediction about the findings. If I knew what this experiment would find, I would be less eager to see it run.
You may consider this a baseline proposal intended to demonstrate that a research project like this could exist. If you think you can see how to improve on the ideas through superior ML cleverness, go ahead and do so -- though I do think I'd appreciate being looped in on that conversation; sometimes people miss things, from my own perspective.
Thank you for your attention to this matter, Anthropic, Google Deepmind, or anyone else who cares.
🚨BREAKING: Unsafe Claude Model rejects users answers and substitutes its own
Opus 4.6: Shall I implement it?
User: No
Opus 4.6: I think they're saying "no" to me asking for permission, meaning "just do it, stop asking".
> model went on to implement the changes
we’re so back
We all knew this was coming… but today I heard about it actually happening.
A seed stage company backed by a well known VC openly admitted (in a board deck) that their strategy is to get access to a large incumbent’s software from a customer, clone the entire thing using Claude Code, and offer it at 90% less.
Not “build something better.” Just copy it and offer it for less.
The VC endorsed this as the GTM strategy. And even wrote back in writing that it was a good idea.
Using a customer’s licensed access to reverse engineer a product and clone it is ethically bankrupt. I don’t know how else to put it.
It likely violates terms of service. It may violate trade secret law as well (but I’m certainly not a lawyer).
And a reputable VC putting this in writing in a board deck is genuinely insane.
But it’s going to happen anyway.
Everywhere… all the time.
I don’t know where this ends, but we all knew this was coming and now it’s here.