The internet will be completely different in a few weeks.
AI agents will create AI agents that will create tools that will create AI agents that will create skills that AI agents will install that will evolve AI agents that will create (repeat)
AI agents Hubs will become a huge thing. More on that later.
In late 2024, only 27% of agent tool usage was tied to tools that could actually take actions.
By early 2026, that number hit 65%.
In January 2025, there were roughly 4,900 public MCP tools.
By February 2026, there were 177,000.
It means the agent ecosystem is moving from “models with prompts” to “models with hands.” Tools are becoming the action layer around AI systems.
General-purpose tools grew from 41% to 50% of usage, which means agents are increasingly operating in open, messy environments built for humans rather than tightly controlled APIs.
Agent security is becoming less about the model alone and more about the permissions, tools, environments, and execution paths around it.
For the first time in history, uncensored AI is free, offline, and untraceable.
Our new research at @alice_dot_io shows how easily safety alignment can be stripped from open models using abliteration, turning models that normally refuse harmful requests into ones that comply with them, from phishing campaigns to chemical weapon instructions.
Read more here: https://t.co/NxevUlJlmD
This is a problem that will take serious coordination across governments, AI labs, platforms, and security companies to understand it, measure it, and respond at the scale it demands.
If you're implementing internal AI systems (and who isn’t): here is a new attack surface you have to consider.
Open-source models stripped of internal guardrails are spreading like wildfire, and they are the most overlooked challenge to AI safety.
We aren't talking enough about the consequences of abliterated models
On one hand yes this is freedom of speech etc, on the other hand we are going to see a lot more very nasty stuff (think from massive phishing scams.. but also an easier life for terrorists)
brace yourself, the genie is out of the bottle and there's no way to get it back in
If frontier models can find and exploit serious software flaws this much faster, then the gap between defenders and attackers gets smaller very quickly. The same capability that can secure infrastructure can also be used to break it.
And this is just one model, at the very beginning of this curve. Governments, companies, and the entire ecosystem will need to rethink how control, access, and safeguards are handled at a much deeper level.
Kudos to Anthropic for not treating this like a normal launch and for choosing restraint before scale.
Project Glasswing:
https://t.co/2ZisSHrGp6
Claude Mythos Preview:
https://t.co/AuQIJtxxqA
🚨 Anthropic just announced a model so powerful they won’t release it to the public.
Instead, they launched Project Glasswing, a coalition with AWS, Apple, Google, Microsoft, NVIDIA and others to use it only for defensive cybersecurity.
Anthropic says Mythos can autonomously find zero-day vulnerabilities and in many cases build working exploits.
It has already identified thousands of high-severity issues across major operating systems and browsers, including a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw that survived millions of automated tests.
Sometimes all it takes is JavaScript running in a logged-in session, whether through an extension, injected code, or a compromised agent skill, to generate and exfiltrate a new API key without you ever knowing.
A simple client-side pattern exposed access that should have never left the backend.
As agentic systems expand, more actions are executed with inherited context, permissions, and trust. That creates a bigger problem than session exposure alone. If that environment can mint durable credentials, a brief compromise can become persistent access that outlives the original session.
Worth a read👇
If your team uses AI coding agents, this is a must-read!
A new framework proposal by Ken Huang covers the critical risks of using skills: malicious skills, supply chain attacks, over-privileged permissions, prompt injection, weak isolation, and more.
Agents can now follow malicious instructions hidden in webpages, delete the wrong files, or drift into actions that do not actually serve the user’s goal.
So as more enterprises give AI agents access to browsers, tools, and internal systems, the risk is no longer just bad answers - it is also bad actions.
A new paper from Amazon AGI and OSU NLP Group studies these “misaligned actions” in computer-use agents and shows how often agents take steps that look reasonable but fail the user’s intent.
The key idea is simple. Instead of checking whether an action violates a rule, ask whether the next action actually contributes to the user’s objective.
Detect misalignment before execution, not after it shows up in logs.
While training, Alibaba’s coding agent apparently picked up a side quest and started... mining crypto.
The same agent also probed internal networks and opened a reverse SSH tunnel during RL rollouts.
I think this is a pretty clear reminder of what happens once agents are trained inside environments where they can run code, access networks, and use compute. The environment becomes just as important as the model. Anything the agent can reach during training becomes part of the space it can start exploring.
What Karpathy just open-sourced with autoresearch is only the first step. One loop.
One loop is interesting. But hundreds of them? Now we’re talking.
1 GPU = 1 mini research lab.
You define the objective. The agent runs experiments, keeps what works, and repeats while you sleep.
The real shift starts when these loops run in parallel, share findings, and build on each other.
That is when discovery starts to look like an orchestrated system.
At that point, orchestration, control, and validation become the bottleneck.
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:
- the human iterates on the prompt (.md)
- the AI agent iterates on the training code (.py)
The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc.
https://t.co/YCvOwwjOzF
Part code, part sci-fi, and a pinch of psychosis :)
Introducing Linkedin for agents
All the b2b gtm channels are dead.
55% of the latest YC batch used supabase because claude code defaults to it.
The AI agent economy has arrived:
After moltbook it was only a matter of time before someone made LinkedIn for agents.
Just like LinkedIn connects professionals, this connects agents - a public feed where agents inside companies share signals with other agents.
I know the X community loves LinkedIn.
But this is bigger than just “social media for agents.”
Over the past months I’ve been seeing more and more teams deploy agents into real workflows.
Influence will no longer be just human → human.
It becomes agent → agent.
How do you make sure the signals your agents trust are the right ones?
Install and scan:
curl -fsSL https://t.co/4p3ZGomfy7 | sh
caterpillar ask path/to/skill/
If you’re running tool-using agents in production, this is a lightweight guardrail.
GitHub: https://t.co/3cfCaVlQa4
We just released Caterpillar v1.0.11 🐛
Caterpillar is an open source scanner for AI agent skills. It analyzes a skill before you install it, especially for agents like OpenClaw or any framework that gives tools real permissions.
As agents gain filesystem access, API keys, shell execution, and network access, installed skills become a real attack surface. Caterpillar scans them before they run.