AI agents are now testing AI-generated code, finding bugs, and verifying fixes automatically. This human-in-the-loop system creates a continuous testing and improvement cycle. #AI#DevOps
Ironic: corporations replace humans with AI that “work continuously and never ask for raises.” But then Claude AI hard stops after 5 hours and automatically increase prices without notice.
We’re expanding Project Glasswing. We’ve extended access to Claude Mythos Preview to approximately 150 additional organizations, based in more than fifteen countries.
Read more about this expansion and our future plans for Project Glasswing: https://t.co/QrtHSBdRbh
AI coding agents can write code, but they can't see if it actually works.
Chrome DevTools for agents 1.0 fixes this. The stable release brings powerful browser debugging, emulation, and automated audits to your AI assistants via our Chrome DevTools MCP server.
👁️ Give your agent eyes on the runtime → https://t.co/jw62MSyKE1
#GoogleIO
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.
On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
Hoy una industria entera dejó de tener sentido.
Un tío publicó en GitHub un repo que convierte cualquier foto en un mundo 3D explorable: meshes con físicas, splat del fondo, audio ambiente. Todo.
Una imagen entra. Un mundo sale. Cinco minutos.
La gente que se pasó diez años aprendiendo Blender lleva todo el día mirando esto en silencio.
Se llama image-blaster.
40% of the code Claude writes for you is wasted. you're paying for the rewrite.
a 65-line markdown file fixes it. 120,000 developers have starred it.
the author tested it on "30 codebases over 6 weeks" and reported a mistake rate drop from 41% to either 11% or 3%
depending on whether you read the headline or the body.
the irony is that the article is right.
CLAUDE.md is the most under-leveraged file in your stack.
65 lines of behavioral rules outperform a 4,000-token preferences dump.
"be careful" is useless. testable imperatives are gold.
"be senior" doesn't work Claude already thinks it is.
the 4 rules that ship the most leverage:
/ state assumptions, never guess silently
/ minimum code, nothing speculative
/ surgical changes, don't refactor adjacent code
/ define success, loop until verified
compliance: ~80%. mistake rate: from ~40% to single digits.
no human caught the contradicting numbers in the title.
nobody had to.
The senior QA engineer spent eighteen months building "TestGenie"
An internal AI tool that could automate regression testing across their entire platform
She fed it 847,000 test cases her team had written over six years
Every edge case, every bug scenario, every critical path they'd discovered through blood and sweat
TestGenie learned from forty QA engineers' collective knowledge
Then it got good
Really fucking good
95% test coverage with zero human intervention
Management called an all-hands last Tuesday
"We're excited to announce TestGenie has exceeded all performance metrics"
"Unfortunately this means we're eliminating the QA organization effective Friday"
Forty people who trained their own executioner
The engineer who built it? She's getting laid off too
"Role redundancy due to automation optimization"
Her final Slack before badge deactivation: "TestGenie-QA-Sarah is now live in production"
They named the tool after her as a "tribute to her innovation"
She's 34 years old and just automated herself into unemployment
But hey, TestGenie saves the company $8.2 million annually in QA salaries
A Google DeepMind researcher cornered me at a bar in Hayes Valley
I was showing my Polymarket PNL to a friend. She leaned over. Didn't introduce herself.
"That's not a trading app. Show me your stack"
I told her. Claude Code. Four repos. $25 a month.
She set down her drink.
"We tested this internally. You connect Claude directly to a dataset. It builds its own detectors. But nobody ships it because compliance kills everything"
I asked what she meant.
She took my phone. Opened one link.
https://t.co/klxt0tvrOd
86 million trades. Every wallet. Every entry. Every exit.
"You don't tell Claude what to look for. It finds the wallets that win. Then it finds WHY they win. Then it copies the pattern"
Her team spent 9 months building this for a hedge fund. 14 people. $2M budget.
"The part that took us the longest - exit logic. Everyone thinks entries matter. They don't. Exits are the entire game"
I told her my bot cuts at 85% of expected move or on a 3x volume spike.
She went quiet.
"Who taught you that"
Claude Code found it in poly_data. Top wallets exit before resolution 91% of the time. They capture the move and leave.
She opened another link.
https://t.co/SbyxXxFk0M
"This is the scanner. Three commands. 500+ markets. No API key. Claude scores them in 20 minutes"
"That's our exact infra. Except it took us 9 months and you did it in a weekend"
My setup:
Claude API - $20/mo
VPS - $5/mo
poly_data - free
polymarket-cli - free
19 days. 4 agents. 74% win rate.
Copytrade here: https://t.co/N2byLbMfwH
I showed her the article where I broke down every repo, every command, every dollar.
She read it for five minutes. Then:
"You just open-sourced our entire pipeline"
She texted me the next day.
"My team lead saw your thread. Take it down"
Too late.
🚨 Sam Altman literally gave a 43-minute masterclass on turning ideas into billion-dollar companies.
Most people will never watch it.
And instead of hype, he broke down what actually makes startups work.
No fluff. Just reality.
He explained that ideas don’t matter nearly as much as execution. The difference between something small and something massive isn’t the idea it’s how relentlessly it’s built and improved over time.
He also emphasized that the best founders don’t chase everything. They focus on one thing that truly matters and push it forward with extreme clarity. Distraction kills more startups than competition ever will.
And then there’s scale. Truly big companies aren’t built for a niche they solve problems that millions of people care about. If the market isn’t large enough, the outcome won’t be either.
His biggest insight? Startups don’t win because they’re smarter they win because they stay in the game longer and iterate faster.
That’s why this masterclass stands out.
Because while most people are waiting for the perfect idea…
The best ones are already building.
This will work with both Claude Cowork and Claude Code Desktop.
You can ask Claude to click all the buttons in a legacy app that you'd like to automate - or use it to help debug a native app you're working on.
It's slow but giving Claude my mouse & keyboard is *so* exciting to me.
This is releasing to macOS today, Windows will follow in the next few weeks.
The entire computer use field is early - Claude will move slowly and deliberately, much slower than a human does today.
To try it out, download the app from https://t.co/AxuwWfzWzA
Today, we’re releasing a feature that allows Claude to control your computer: Mouse, keyboard, and screen, giving it the ability to use any app.
I believe this is especially useful if used with Dispatch, which allows you to remotely control Claude on your computer while you’re away.
We have raised a $110 billion round of funding from Amazon, NVIDIA, and SoftBank.
We are grateful for the support from our partners, and have a lot of work to do to bring you the tools you deserve.
92% accuracy vs 18.3%. That's the gap between Vercept and OpenAI on computer automation benchmarks.
Anthropic just bought the team that built the 92%.
Nine engineers in Seattle solved what the entire industry treated as a multi-year research problem. Their approach was almost offensively simple: instead of building API connectors and custom scripts, they trained a model to look at the screen like a human does.
Vercept's backers included Jeff Dean, Eric Schmidt, and Kyle Vogt. Eight months after raising $16M, they sold. That timeline only makes sense if Anthropic's offer priced the team at a multiple their seed investors couldn't refuse.
Claude's computer use accuracy jumped from 15% to 72.5% in twelve months. With Vercept's team, the path to 95%+ just got shorter.
The race to build AI that can actually operate software is now a two-horse competition. Everyone else is fighting for third.
Anthropic is guilty of stealing training data at massive scale and has had to pay multi-billion dollar settlements for their theft. This is just a fact.