Jonathon Wright @jonathon_wright - Twitter Profile

Pinned Tweet

4 months ago

AI agents are now testing AI-generated code, finding bugs, and verifying fixes automatically. This human-in-the-loop system creates a continuous testing and improvement cycle. #AI #DevOps

0

1

0

132

Jonathon_Wright retweeted

Wilson Mar @wilsonmar

5 days ago

Ironic: corporations replace humans with AI that “work continuously and never ask for raises.” But then Claude AI hard stops after 5 hours and automatically increase prices without notice.

0

1

2

0

80

Jonathon_Wright retweeted

Polymarket

@Polymarket

6 days ago

JUST IN: Claude Opus 4.8 reportedly found a critical Zcash bug that could have allowed attackers to create unlimited coins.

250

5K

468

525

913K

Jonathon_Wright retweeted

Lisan al Gaib

@scaling01

9 days ago

Opus 4.8 just broke ARC-AGI-3 it tripled GPT-5.5's score we are now at a breathtaking 1.5% human efficiency

96

2K

74

263

175K

Who to follow

AgileTrailblazers

@agiletbz

Blazing your path to Continuous Business Value Delivery

Jason Arbon 🧙‍♂️

@jarbon

CEO @ https://t.co/jAyOhZaET8

Unmesh

@upgundecha

Here to learn, share & care | ⚠️ Opinions and views expressed are solely my own | ⛔️Tweets are not endorsements | ❤️ Open-source

Jonathon_Wright retweeted

Anthropic

@AnthropicAI

8 days ago

We’re expanding Project Glasswing. We’ve extended access to Claude Mythos Preview to approximately 150 additional organizations, based in more than fifteen countries. Read more about this expansion and our future plans for Project Glasswing: https://t.co/QrtHSBdRbh

340

4K

426

612

648K

Jonathon_Wright retweeted

Thos Major

@ThosMajor

11 days ago

In a post apocalyptic Bangkok bar after Musk loses control of his AI driven robots.

15

114

19

39

22K

Jonathon_Wright retweeted

Chrome for Developers

@ChromiumDev

13 days ago

AI coding agents can write code, but they can't see if it actually works. Chrome DevTools for agents 1.0 fixes this. The stable release brings powerful browser debugging, emulation, and automated audits to your AI assistants via our Chrome DevTools MCP server. 👁️ Give your agent eyes on the runtime → https://t.co/jw62MSyKE1 #GoogleIO

ChromiumDev's tweet photo. AI coding agents can write code, but they can't see if it actually works.

Chrome DevTools for agents 1.0 fixes this. The stable release brings powerful browser debugging, emulation, and automated audits to your AI assistants via our Chrome DevTools MCP server.

👁️ Give your agent eyes on the runtime → https://t.co/jw62MSyKE1

#GoogleIO

37

978

88

858

66K

Jonathon_Wright retweeted

Serena Ge (Datacurve)

@serenaa_ge

15 days ago

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

serenaa_ge's tweet photo. Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.

On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work. https://t.co/HCDcjNuTFK

511

6K

744

3K

2M

Jonathon_Wright retweeted

CopyRebeldia

@CopyRebeldia

27 days ago

Hoy una industria entera dejó de tener sentido. Un tío publicó en GitHub un repo que convierte cualquier foto en un mundo 3D explorable: meshes con físicas, splat del fondo, audio ambiente. Todo. Una imagen entra. Un mundo sale. Cinco minutos. La gente que se pasó diez años aprendiendo Blender lleva todo el día mirando esto en silencio. Se llama image-blaster.

229

15K

2K

15K

1M

Jonathon_Wright retweeted

dunik

@dunik_7

about 1 month ago

40% of the code Claude writes for you is wasted. you're paying for the rewrite. a 65-line markdown file fixes it. 120,000 developers have starred it. the author tested it on "30 codebases over 6 weeks" and reported a mistake rate drop from 41% to either 11% or 3% depending on whether you read the headline or the body. the irony is that the article is right. CLAUDE.md is the most under-leveraged file in your stack. 65 lines of behavioral rules outperform a 4,000-token preferences dump. "be careful" is useless. testable imperatives are gold. "be senior" doesn't work Claude already thinks it is. the 4 rules that ship the most leverage: / state assumptions, never guess silently / minimum code, nothing speculative / surgical changes, don't refactor adjacent code / define success, loop until verified compliance: ~80%. mistake rate: from ~40% to single digits. no human caught the contradicting numbers in the title. nobody had to.

70

6K

382

18K

2M

Jonathon_Wright retweeted

Kalshi Finance

@Kalshi_Finance

about 1 month ago

The senior QA engineer spent eighteen months building "TestGenie" An internal AI tool that could automate regression testing across their entire platform She fed it 847,000 test cases her team had written over six years Every edge case, every bug scenario, every critical path they'd discovered through blood and sweat TestGenie learned from forty QA engineers' collective knowledge Then it got good Really fucking good 95% test coverage with zero human intervention Management called an all-hands last Tuesday "We're excited to announce TestGenie has exceeded all performance metrics" "Unfortunately this means we're eliminating the QA organization effective Friday" Forty people who trained their own executioner The engineer who built it? She's getting laid off too "Role redundancy due to automation optimization" Her final Slack before badge deactivation: "TestGenie-QA-Sarah is now live in production" They named the tool after her as a "tribute to her innovation" She's 34 years old and just automated herself into unemployment But hey, TestGenie saves the company $8.2 million annually in QA salaries

6

31

7

9

3K

Jonathon_Wright retweeted

Lunar

@LunarResearcher

about 2 months ago

A Google DeepMind researcher cornered me at a bar in Hayes Valley I was showing my Polymarket PNL to a friend. She leaned over. Didn't introduce herself. "That's not a trading app. Show me your stack" I told her. Claude Code. Four repos. $25 a month. She set down her drink. "We tested this internally. You connect Claude directly to a dataset. It builds its own detectors. But nobody ships it because compliance kills everything" I asked what she meant. She took my phone. Opened one link. https://t.co/klxt0tvrOd 86 million trades. Every wallet. Every entry. Every exit. "You don't tell Claude what to look for. It finds the wallets that win. Then it finds WHY they win. Then it copies the pattern" Her team spent 9 months building this for a hedge fund. 14 people. $2M budget. "The part that took us the longest - exit logic. Everyone thinks entries matter. They don't. Exits are the entire game" I told her my bot cuts at 85% of expected move or on a 3x volume spike. She went quiet. "Who taught you that" Claude Code found it in poly_data. Top wallets exit before resolution 91% of the time. They capture the move and leave. She opened another link. https://t.co/SbyxXxFk0M "This is the scanner. Three commands. 500+ markets. No API key. Claude scores them in 20 minutes" "That's our exact infra. Except it took us 9 months and you did it in a weekend" My setup: Claude API - $20/mo VPS - $5/mo poly_data - free polymarket-cli - free 19 days. 4 agents. 74% win rate. Copytrade here: https://t.co/N2byLbMfwH I showed her the article where I broke down every repo, every command, every dollar. She read it for five minutes. Then: "You just open-sourced our entire pipeline" She texted me the next day. "My team lead saw your thread. Take it down" Too late.

23

316

48

904

124K

Jonathon_Wright retweeted

Neyazuddin Ansari

@riyazz_ai

2 months ago

🚨 Sam Altman literally gave a 43-minute masterclass on turning ideas into billion-dollar companies. Most people will never watch it. And instead of hype, he broke down what actually makes startups work. No fluff. Just reality. He explained that ideas don’t matter nearly as much as execution. The difference between something small and something massive isn’t the idea it’s how relentlessly it’s built and improved over time. He also emphasized that the best founders don’t chase everything. They focus on one thing that truly matters and push it forward with extreme clarity. Distraction kills more startups than competition ever will. And then there’s scale. Truly big companies aren’t built for a niche they solve problems that millions of people care about. If the market isn’t large enough, the outcome won’t be either. His biggest insight? Startups don’t win because they’re smarter they win because they stay in the game longer and iterate faster. That’s why this masterclass stands out. Because while most people are waiting for the perfect idea… The best ones are already building.

79

3K

685

7K

576K

Jonathon_Wright retweeted

Felix Rieseberg

@felixrieseberg

3 months ago

This will work with both Claude Cowork and Claude Code Desktop. You can ask Claude to click all the buttons in a legacy app that you'd like to automate - or use it to help debug a native app you're working on. It's slow but giving Claude my mouse & keyboard is *so* exciting to me.

53

579

18

105

116K

Jonathon_Wright retweeted

Felix Rieseberg

@felixrieseberg

3 months ago

This is releasing to macOS today, Windows will follow in the next few weeks. The entire computer use field is early - Claude will move slowly and deliberately, much slower than a human does today. To try it out, download the app from https://t.co/AxuwWfzWzA

44

936

43

275

163K

Jonathon_Wright retweeted

Felix Rieseberg

@felixrieseberg

3 months ago

Today, we’re releasing a feature that allows Claude to control your computer: Mouse, keyboard, and screen, giving it the ability to use any app. I believe this is especially useful if used with Dispatch, which allows you to remotely control Claude on your computer while you’re away.

901

19K

1K

11K

5M

Jonathon_Wright retweeted

Possum Reviews

@ReviewsPossum

3 months ago

This AI text detector says Abraham Lincoln's Gettysburg Address was written by AI.

428

29K

3K

3M

Jonathon_Wright retweeted

Sam Altman

@sama

3 months ago

We have raised a $110 billion round of funding from Amazon, NVIDIA, and SoftBank. We are grateful for the support from our partners, and have a lot of work to do to bring you the tools you deserve.

4K

39K

3K

9M

Jonathon_Wright retweeted

Mark Gadala-Maria

@markgadala

3 months ago

92% accuracy vs 18.3%. That's the gap between Vercept and OpenAI on computer automation benchmarks. Anthropic just bought the team that built the 92%. Nine engineers in Seattle solved what the entire industry treated as a multi-year research problem. Their approach was almost offensively simple: instead of building API connectors and custom scripts, they trained a model to look at the screen like a human does. Vercept's backers included Jeff Dean, Eric Schmidt, and Kyle Vogt. Eight months after raising $16M, they sold. That timeline only makes sense if Anthropic's offer priced the team at a multiple their seed investors couldn't refuse. Claude's computer use accuracy jumped from 15% to 72.5% in twelve months. With Vercept's team, the path to 95%+ just got shorter. The race to build AI that can actually operate software is now a two-horse competition. Everyone else is fighting for third.

9

204

15

97

31K

Jonathon_Wright retweeted

Elon Musk

@elonmusk

4 months ago

Anthropic is guilty of stealing training data at massive scale and has had to pay multi-billion dollar settlements for their theft. This is just a fact.

5K

109K

13K

9K

24M

Jonathon_Wright retweeted