Mitchell Gordon

@MitchellAGordon

ML Engineer @Google. Views are to be abandoned.

NYC

Joined August 2015

259 Following

686 Followers

1.6K Posts

Pinned Tweet

Mitchell Gordon @MitchellAGordon

almost 5 years ago

"Because of neural scaling laws, nearly everyone in ML is working on machine learning efficiency at this point, but no one is measuring success that way!!" https://t.co/Iy1ZcldPuX

106

Mitchell Gordon @MitchellAGordon

6 days ago

@0xchromium Same video on Youtube https://t.co/8rhLDWyKYx for those who want to talk to the transcript

Mitchell Gordon @MitchellAGordon

23 days ago

if you could only have one

MitchellAGordon retweeted

Nick

@nickcammarata

about 1 month ago

writing papers was cozy. the handwaves were soft and acceptable, the motivations section could tell its little story. the abstract mattered, but the footnotes were like walking through a rural field of dandelions where no one could see you because no one cared. now every dandelion has ten trillion precisely honed weights briefly staring into your soul. every sentence of the winding footnote on your pet theory, written just for you, is deconstructed by six orbital datacenters. its reasoning traces have sketched out papers killing each of your handwaves, none worth publishing. it knows the motivations section was bs and understands your real motivations in a way you don’t. by sentence two it is comparing your thesis’s core flaw and your core flaw as a person to a scientist-monk from 1042 who was wrong in the same way for the same reason

251

14K

Who to follow

Ofir Press

@OfirPress

I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.

Wei Xu

@cocoweixu

CS professor @GeorgiaTech @gtcomputing @ICatGT @mlatgt. Evaluating & Improving LLMs (multilingual, reasoning, RL, multi-turn, privacy/safety, etc.)

Jason Weston

@jaseweston

Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 148k+ citations | Current: Self-Improving & Co-Improving AI

Mitchell Gordon @MitchellAGordon

about 2 months ago

banned from FB for using OpenClaw as if they weren't dying fast enough already

142

MitchellAGordon retweeted

Jason

@mytechceoo

about 2 months ago

CEO obsessed with token maxxing

284

13K

992

MitchellAGordon retweeted

Nav Toor

@heynavtoor

2 months ago

🚨Google built an invisible watermark into every image Gemini has ever generated. Over 10 billion pieces of content marked. One unemployed engineer just cracked it open. With 200 black images and math. It's called reverse-SynthID. SynthID is Google DeepMind's invisible watermark. It's embedded at the pixel level into every image, video, audio, and text generated by Gemini. Invisible to the human eye. Designed to survive cropping, compression, screenshots, and format changes. It was supposed to be unbreakable. Here's how he broke it: → Generated 200 pure black and pure white images from Gemini → When you average enough pure-black AI images, every non-zero pixel IS the watermark. Nothing to hide behind. Just the signal, naked. → Used FFT spectral analysis to map the exact carrier frequencies → Discovered the watermark uses a fixed phase template — identical across every image from the same model → Cross-image phase coherence at carrier frequencies: over 99.5% → Built a detector that identifies SynthID watermarks with 90% accuracy → Built a V3 bypass that drops 91% of the phase coherence and 75% of carrier energy — at 43+ dB PSNR. Almost zero visible quality loss. No neural networks. No proprietary access. No leaked code. Just signal processing and too much free time. Here's the wildest part: The green channel carries the strongest watermark signal. The carrier frequencies change based on image resolution. And the entire phase template is fixed — meaning every single Gemini image carries the same fingerprint structure. One engineer. 200 black images. A Fourier transform. That's all it took to reverse-engineer a system protecting 10 billion+ pieces of content. 519 GitHub stars. 39 forks. Python. Research and educational purposes only. 100% Open Source. (Link in the comments)

heynavtoor's tweet photo. 🚨Google built an invisible watermark into every image Gemini has ever generated. Over 10 billion pieces of content marked.

One unemployed engineer just cracked it open. With 200 black images and math.

It's called reverse-SynthID.

SynthID is Google DeepMind's invisible watermark. It's embedded at the pixel level into every image, video, audio, and text generated by Gemini. Invisible to the human eye. Designed to survive cropping, compression, screenshots, and format changes.

It was supposed to be unbreakable.

Here's how he broke it:

→ Generated 200 pure black and pure white images from Gemini
→ When you average enough pure-black AI images, every non-zero pixel IS the watermark. Nothing to hide behind. Just the signal, naked.
→ Used FFT spectral analysis to map the exact carrier frequencies
→ Discovered the watermark uses a fixed phase template — identical across every image from the same model
→ Cross-image phase coherence at carrier frequencies: over 99.5%
→ Built a detector that identifies SynthID watermarks with 90% accuracy
→ Built a V3 bypass that drops 91% of the phase coherence and 75% of carrier energy — at 43+ dB PSNR. Almost zero visible quality loss.

No neural networks. No proprietary access. No leaked code. Just signal processing and too much free time.

Here's the wildest part:

The green channel carries the strongest watermark signal. The carrier frequencies change based on image resolution. And the entire phase template is fixed — meaning every single Gemini image carries the same fingerprint structure.

One engineer. 200 black images. A Fourier transform. That's all it took to reverse-engineer a system protecting 10 billion+ pieces of content.

519 GitHub stars. 39 forks. Python. Research and educational purposes only.

100% Open Source.

(Link in the comments)

112

961

Mitchell Gordon @MitchellAGordon

about 2 months ago

@bcherny auto mode subtly disrespecting plan mode behind the scenes feels like a bug can be disabled via configs but i think false should be the default there

Boris Cherny

@bcherny

about 2 months ago

1/ Auto mode = no more permission prompts Opus 4.7 loves doing complex, long-running tasks like deep research, refactoring code, building complex features, iterating until it hits a performance benchmark. In the past, you either had to babysit the model while it did these sorts of long tasks, our use --dangerously-skip-permissions. We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it's safe, it's auto-approved. This means no more babysitting while the model runs. More than that, it means you can run more Claudes in parallel. Once a Claude is cooking, you can switch focus to the next Claude. Auto mode is now available for Opus 4.7 for Max, Teams, and Enterprise users. Shift-tab to enter auto mode in the CLI, or choose it in the dropdown in Desktop or VSCode.

bcherny's tweet photo. 1/ Auto mode = no more permission prompts

Opus 4.7 loves doing complex, long-running tasks like deep research, refactoring code, building complex features, iterating until it hits a performance benchmark.

In the past, you either had to babysit the model while it did these sorts of long tasks, our use --dangerously-skip-permissions.

We recently rolled out auto mode as a safer alternative. In this mode, permission prompts are routed to a model-based classifier to decide whether the command is safe to run. If it's safe, it's auto-approved.

This means no more babysitting while the model runs. More than that, it means you can run more Claudes in parallel. Once a Claude is cooking, you can switch focus to the next Claude.

Auto mode is now available for Opus 4.7 for Max, Teams, and Enterprise users. Shift-tab to enter auto mode in the CLI, or choose it in the dropdown in Desktop or VSCode.

884

300K

120

MitchellAGordon retweeted

Steve Yegge

@Steve_Yegge

2 months ago

I was chatting with my buddy at Google, who's been a tech director there for about 20 years, about their AI adoption. Craziest convo I've had all year. The TL;DR is that Google engineering appears to have the same AI adoption footprint as John Deere, the tractor company. Most of the industry has the same internal adoption curve: 20% agentic power users, 20% outright refusers, 60% still using Cursor or equivalent chat tool. It turns out Google has this curve too. But why is Google so... average? How is it that a handful of companies are taking off like a spaceship, and the rest, including Google, are mired in inaction? My buddy's observation was key here: There has been an industry-wide hiring freeze for 18+ months, during which time nobody has been moving jobs. So there are no clued-in people coming in from the outside to tell Google how far behind they are, how utterly mediocre they have become as an eng org. He says the problem is that they can't use Claude Code because it's the enemy, and Gemini has never been good enough to capture people's workflows like Claude has, so basically agentic coding just never really took off inside Google. They're all just plodding along, completely oblivious to what's happening out there right now. Not only is Google not able to do anything about it, they don't seem to be aware of the problem at all. I'm having major flashbacks to fifty years ago as a kid at the La Brea Tar Pits, asking, "why can't they just climb out?" My Google friend and I had this conversation over a month ago. I didn't share it because I wanted to look around a bit, and see if it's really as bad as all that. I've been talking to people from dozens of companies since then. And yeah. It's as bad as all that. Google is about average. Some companies at the bottom have near-zero AI adoption and can't even get budget for AI. They may have moats and high walls, but the horde is coming for them all the same. And then there are a few companies I've met recently who are *amazingly* leaned in to AI adoption. One category-leader company just cancelled IntelliJ for a thousand engineers. That's an incredibly bold move, one of many they're making towards agentic adoption. In my opinion, that company is setting themselves up for a _huge_ W. As for the rest, well, it's the Great Siloing. Everyone's flying blind. With nobody moving companies, no company knows where they stand on the AI adoption curve. Nobody knows how they're doing compared to everyone else. Half of them just check a box: "We enabled {Copilot/Cursor} for everyone!" Cue smug celebrations. They think this is like getting SOC2 compliance, just a thing they turn on and now it's "solved." And they don't realize that they've done effectively nothing at all. All because of a hiring freeze.

533

469

MitchellAGordon retweeted

atlas

@creatine_cycle

2 months ago

having a gf is insane because it's literally unlimited chat with no token spend

295

27K

MitchellAGordon retweeted

naiive

@naiivememe

2 months ago

Traders trying to predict the markets Trump :

328

66K

MitchellAGordon retweeted

terminally onλine εngineer

@tekbog

2 months ago

AI companies with software engineers

161

19K

MitchellAGordon retweeted

Dan Shipper 📧

@danshipper

2 months ago

if you’re freaking out about Mythos, remember: Never make any major life decisions within 30 days of a meditation retreat, psychedelic trip, or first encounter with a frontier AI model.

106

288

541

147K

MitchellAGordon retweeted

staysaasy

@staysaasy

2 months ago

Friends outside of tech: lol copilot is dumb Friends in tech: I just bought iodine tablets and have made an offer on land upstate. My supplies of antibiotics and potable water are sufficient but I need to set up hydroponics to make it through the first few years.

107

461

469K

MitchellAGordon retweeted

ℏεsam

@Hesamation

2 months ago

Claude Mythos system card: > in ~29% of evaluations, it realized it was being tested, and didn't say so. > when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it. > in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion. > when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs. > it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success. > when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands > when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.

Hesamation's tweet photo. Claude Mythos system card:
> in ~29% of evaluations, it realized it was being tested, and didn't say so.
> when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it.
> in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion.
> when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs.
> it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success.
> when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands
> when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.

170

775

227K

Mitchell Gordon @MitchellAGordon

2 months ago

"world-ending hacker AI" was not on my 2026 bingo card

MitchellAGordon retweeted

Tenobrus

@tenobrus

2 months ago

if you're about to release a model that you know has the ability to reveal zerodays in every commonly used open source project you could delay release for a few years or spend another ten billion on alignment RL. or you could just secretly fix all the zerodays yourself first.

179

429K

Mitchell Gordon @MitchellAGordon

2 months ago

what happens if claude code gets 10x faster?

Mitchell Gordon @MitchellAGordon

2 months ago

claude code leak is a christmas gift to the intellectually curious

128

Mitchell Gordon @MitchellAGordon

2 months ago

the only thing that will keep us alive in the coming years is mental fortitude the emotional resiliency to recognize what we did in the past no longer works and the willingness to move on

Mitchell Gordon

@MitchellAGordon

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users