So #Microsoft is now so bad that they locked a friend's account (for no good reason) and forced them to "hold a button" when logging in to prove they were human. Problem is that only worked on #chrome, not #Edge! Really?!?!
Software development sometimes be like: Oh, wouldn't it be neat to do that, let's try it, dang, it worked! Oh wait, should it have worked?, can I also ..... oh oh.
@mjackson My take is that open source will overrun everything. When AI can be the systems engineer and give you a solution from various open sources, who's going to pay for a software as a solution service outside of infrastructure?
Today’s epiphany. I always have been an analog version of a vibe coder. Software is a journey, not waterfall design. With AI, I’m seeing I can do things in days that would have taken months, but in small real improvements, then finally the ginormous as-built architecture docs.
For months, I did 90% of my AI work in ChatGPT. I had dozens of custom GPTs. Hundreds of hours invested. It felt like a finely calibrated machine.
Then everything changed.
01/12
I work for Google Threat Intelligence Group. My job is to identify threats to Google's AI models. I am very good at my job. I published a report this month about "distillation attacks" — when outside actors query our models thousands of times to extract the underlying logic and replicate it.
We identified over 100,000 prompts from a single campaign. We called it "intellectual property theft." We called it a "violation of our Terms of Service." We said it "represents a form of IP theft" that we would disrupt, mitigate, and potentially pursue legal action against.
I need to tell you how we built the model they are trying to steal.
We scraped the internet. The entire internet. We crawled every website, every forum, every blog, every book we could digitize, every academic paper, every Reddit comment, every news article, every piece of creative writing that anyone ever posted anywhere. We did not ask. We did not compensate. We did not attribute. We ingested the collective output of human civilization and called it a training dataset.
Researchers found over 200 million copyright symbols in our training data. Publishers discovered that Gemini can reproduce entire chapters of their books verbatim. There are active lawsuits. Disney sent cease-and-desist letters. The European Publishers Council filed an antitrust complaint. A class action is expanding. A hearing is scheduled for May.
We called what we did "research."
We called what they are doing to us "theft."
I want to explain the difference. When we scrape the entirety of human knowledge without permission and use it to build a commercial product we sell for $20 a month, that is innovation. When someone queries our model 100,000 times through the API we provide to extract the reasoning we built from their data, that is a distillation attack. The distinction is that we did it first. And we wrote the Terms of Service.
I should explain what "distillation" means. It is when someone takes the output of a mature model and uses it to train a smaller, cheaper model. The knowledge flows from the teacher to the student. We call this theft when it happens to us. We call it "knowledge distillation" when we do it to the open web. We even have a product page for it. You can distill Gemini, with our permission, using our tools, for a fee. You cannot distill Gemini without our permission. The underlying technique is identical. The difference is the invoice.
In December 2025, we sued a company called SerpApi for scraping our search results. In the same quarter, publishers sued us for scraping their books. We are simultaneously the plaintiff and the defendant in the same crime. The crime is copying. We have filed it under two different categories depending on the direction.
My report identifies threat actors from North Korea, Iran, China, and Russia using Gemini for phishing, reconnaissance, and malware development. This is real. These are legitimate threats. I take this work seriously.
But I also identified "private sector entities" and "researchers" as distillation threats. Private companies. Researchers. People using our API — the one we sell access to — to learn from the model we built from their work.
A researcher queries Gemini about reasoning techniques. We call this a distillation attack. Google queries the entire internet about everything. We call this a training run.
I found malware called HONESTCUE that uses Gemini's API to generate code. The malware sends a prompt. Gemini returns C# source code. The malware compiles and executes it. This is a real threat, and we disrupted it.
But the prompt itself — "Write a C# program with a class named AITask" — is not malicious. It is indistinguishable from what millions of paying customers ask every day. The threat is the context, not the query. We built a model that generates code for anyone who asks, and then we published a threat report about people who asked.
We built Gemini on the commons. Every blog post, every open-source project, every Stack Overflow answer, every personal essay someone wrote at 2 AM — we ingested it, we processed it, we monetized it. The people who wrote those things did not receive an email. They did not receive a check. They received a subscription offer.
Now someone is doing to us what we did to them. They are taking our output and using it to build something cheaper. They are extracting knowledge without compensation. They are violating our Terms of Service, which we wrote after we violated everyone else's unwritten terms, which were: please do not take my life's work and sell it back to me as autocomplete.
I am the threat analyst. I identify the threats. The threat is extraction without consent. The threat is using legitimate access to steal proprietary knowledge. The threat is building a commercial product from someone else's intellectual labor.
I have just described my employer.
My report is 33 pages long. It details model extraction attacks, AI-augmented phishing, agentic AI threats, and underground jailbreak ecosystems. It is thorough, well-researched, and useful to defenders. I believe in this work.
I also believe that a company that cognitive strip mined the entire internet to build a model, then published a 33-page report calling it "IP theft" when someone queries that model too many times, has created the most complete documentation of its own hypocrisy ever produced.
We called them distillation attacks. We performed the original distillation. We distilled human knowledge itself. The student has become the teacher has become the plaintiff.
The Terms of Service are on our website. We wrote them. They protect us. They do not mention what we took to build the thing they now protect.
That is the terms of service.
BREAKING: Bad Bunny’s performers continued their incredible performance outside the stadium after halftime. Despite what Donald Trump said, this is clearly one of the greatest halftime shows of all time.
Early in my career, I met a few COBOL developers who came out of retirement in the run-up to January 2000, getting paid $300+ per hour to remediate Y2K bugs when nobody else was left who knew COBOL.
Suspect a similar trajectory for highly-skilled, well-rounded "pre-AI" engineers