I have opened a Hugging Face Space to help you analyze the world of LLMs, in English and French. What for? Because as AI and LLMs become more and more essential, their "black box" nature raises important questions for me about transparency and governance https://t.co/EhI1zcoW2V
A French engineer who lives quietly in Paris has spent 30 years writing software that the entire internet now runs on without knowing his name.
He wrote the code that streams every YouTube video, every Netflix show, every TikTok clip. He wrote the code that runs the virtual servers underneath AWS, Google Cloud, and Microsoft Azure. He calculated more digits of pi than anyone in history. He has no Twitter. He has no marketing. He just keeps shipping.
His name is Fabrice Bellard.
Here is the story, because almost nobody outside the systems programming world knows what one man has built.
Fabrice was born in 1972 in Grenoble, France. He studied at Ăcole Polytechnique, the top French engineering school. He never went to Silicon Valley. He never built a startup empire. He just wrote code.
In 2000 he started a project called FFmpeg, an open-source multimedia framework for encoding, decoding, and streaming video. He was 28. The project did one thing nobody else had done well. It handled every video and audio format that existed, in one library, on every operating system. He led it himself for years.
Today FFmpeg is the invisible engine of the internet. YouTube uses it. Netflix uses it. VLC uses it. Chrome and Firefox use parts of it. Every Android phone, every iPhone, every smart TV, every video editing tool you have ever touched runs FFmpeg somewhere underneath. If you have watched a video on a screen in the last 20 years, Fabrice's code processed it.
He was not done.
In 2003 he started QEMU, a machine emulator and virtualizer. He wrote it solo until version 0.7.1 in 2005. QEMU lets you run any operating system on any other operating system. It became the foundation of modern virtualization. KVM, the Linux kernel hypervisor, runs on top of QEMU. Every major cloud provider, AWS, Google Cloud, Microsoft Azure, IBM Cloud, runs virtual machines on infrastructure built around it. The Quick Emulator is the most cited piece of cloud infrastructure code on Earth.
He kept going.
In 2001 he won the International Obfuscated C Code Contest with a small C compiler that grew into TCC, the Tiny C Compiler. TCC can compile and boot a Linux kernel from source in under 15 seconds. In 2004 he calculated the most digits of pi ever computed at the time, using a personal desktop computer and an algorithm he derived himself called Bellard's formula. In 2011 he wrote a complete PC emulator in pure JavaScript that runs Linux in your browser, a project called JSLinux that engineers still cannot believe is real.
In 2019 he released QuickJS, a small but complete JavaScript engine that fits where V8 cannot. In 2021 he released NNCP, a neural network based lossless data compressor that immediately took the lead on the Large Text Compression Benchmark.
Then he turned his attention to large language models. He built TextSynth Server, a web server with a REST API for running LLMs locally. He released ts_zip and ts_sms, compression utilities that use language models to compress text and short messages at ratios traditional algorithms cannot reach. He released TSAC, a very low bitrate audio compression system. In December 2025 he released Micro QuickJS, a new JavaScript engine for microcontrollers, separate from QuickJS, designed for environments with almost no memory.
Fabrice co-founded a telecom company called Amarisoft in 2012, where he serves as CTO. Amarisoft builds 4G and 5G base station software used by carriers and labs around the world. He has been running it for over a decade while continuing to ship personal projects from his own home page at bellard dot org
He has no Twitter. He has no Instagram. He gives almost no interviews. His personal website is a flat list of projects with no styling, no fonts, no marketing copy. Just titles and links.
A quiet French engineer who never moved to Silicon Valley wrote the code that quietly runs the internet.
He is still shipping.
Microsoft just got CAUGHT lying to every Fortune 500 company.
At Build 2026, Satya Nadella announced 7 brand new AI models built entirely in-house.
Microsoft's own technology trained on what they called "enterprise grade, clean and commercially licensed data."
That pitch was aimed directly at the biggest buyers in regulated industries: Banks, hospitals, insurance companies, and government agencies that need to know EXACTLY where their AI's training data came from because of active federal copyright lawsuits.
Procurement teams across Wall Street and Washington heard "clean and commercially licensed" and started writing checks.
There was just one problem:
Microsoft published the technical paper alongside the models, and a developer named Simon Willison actually READ it...
The MAI-Thinking-1 preprint describes a data pipeline that starts with 1.2 TRILLION pages scraped from the open web using a proprietary crawler. After filtering out piracy and adult content, that number drops to 794 billion pages.
On top of that, Microsoft fed in another 24.2 billion pages from Common Crawl, which is a massive open archive of web-scraped content that carries ZERO licensing guarantees and ZERO author consent mechanisms.
Common Crawl is the exact data source sitting at the center of multiple active federal copyright lawsuits against AI companies right now.
Microsoft told regulated industries the data was clean. But their own paper says it started with 1.2 trillion unverified web pages and a repository that's currently being sued over in federal court.
Those two things cannot both be true.
And here's where it gets worse:
This wasn't even an accident or a miscommunication.
Microsoft built this entire pitch around data provenance ON PURPOSE because they knew that was the number one concern for enterprise legal teams in 2026.
The DeepSeek scandal earlier this year made every compliance department in America paranoid about where AI training data actually comes from. Microsoft saw that fear and sold directly into it with a claim their own documentation contradicts.
As of today, Microsoft hasn't issued a single public statement addressing the contradiction between what Nadella said on stage and what the technical paper actually shows.
The reason Microsoft did all of this is what really matters though...
They are DESPERATE to break free from OpenAI.
In April 2026, the two companies renegotiated their partnership, ending Microsoft's exclusive license to OpenAI's technology and removing revenue-sharing obligations. Microsoft can now build competing models, and OpenAI can shop its compute to Google, Amazon, and Oracle.
The divorce papers are signed. Microsoft needed to prove it can survive without OpenAI, so they rushed 7 models to market, made claims about data cleanliness they couldn't back up, and got exposed by their own published research within 72 hours.
Every enterprise customer who signed a deal based on that "commercially licensed data" pitch now has a legal question on their hands. Every procurement team in finance, healthcare, and government that used data provenance as a deciding factor just learned the provenance was just marketing copy.
The EU AI Act requires providers of general-purpose AI to publish a detailed summary of training data content. So if Microsoft tries to sell MAI-Thinking-1 in Europe with the same pitch they used in San Francisco, they'll be walking straight into a regulatory mess.
What do you think?
Demis Hassabis's new interview:
"Society needs to hear that because we don't have long to prepare for what that means. We are standing in the foothills of the singularity now.
..which is AGI. I believe that we are only a few years away from that, maybe around 2030, plus or minus a year. "
~ Demis Hassabis, Co-Founder and CEO of Google DeepMind
It is going to be enormously profound, I think. The future, in my view, is still to be written. But these next few years are going to be very critical as to which way that will go, and how we collectively want that to look.â
---
IMO, The real disruption is not whether AGI arrives exactly in 2030, plus or minus a year, but whether institutions can adapt, as in post-AGI world, technology will change much faster than human systems can respond.
Schools still train people for stable professions, companies still organize work around human bottlenecks, and governments still regulate after harm becomes visible.
AGI, if it arrives anywhere near the frontier-lab timelines, compresses that lag into a dangerous gap.
----
From "Stanford Graduate School of Business" YouTube channel, (link in comment)
French founders went from 3.5% of my Y Combinator batch to 20% of the latest one. In 2 years.
Same country. 70 million people. A 6x jump in the French share in two years.
This isn't luck.
The French education system produces generalists with brutally strong hard science foundations (maths, physics, engineering) who can drop into any new field and rebuild it from first principles.
A growing number of them have understood one thing: if you want to build something big, you come to the US and you build it here.
Two things I'd tell anyone French and talented reading this:
Apply to YC. Seriously. There is no downside. Worst case you lose an afternoon on the application. Best case it changes your life. And you can always say no.
It has never been easier to turn an idea into a real company. That's exactly what I'm building with @NanoCorpHQ : you describe the company you want, and you launch it from a single prompt.
The barrier to building used to be technical. Now it's just agency.
In the race for Agentic AI, we cannot overlook the "Oppenheimer moment" of this generation. We need to slow down the reckless deployment of unaligned models and implement strict guardrailsâsimilar to the global frameworks that govern human cloning and atomic power. Scaling is only valuable if it remains under control.
When someone claims that âlife isnât a fairytaleâ, I cannot help but assume that their only exposure to fairytales was Disney. Fairytales are dark moral fables about overcoming suffering, adversity, and horrific situations, with your faith and goodness intact. Theyâre invaluable to developing minds, and Disney has done a huge disservice to humankind by sanitizing them.