GLM-5.2 can now be run locally!🔥
The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size).
Run on a 256GB Mac or RAM/VRAM setups.
GLM-5.2 is the strongest open model to date.
Guide: https://t.co/bI7FeeKHDd
GGUF: https://t.co/BMkxswdj5N
Quick test.
Pick something you think you understand well.
Now imagine you have to explain it to a 12 year old, out loud, with zero specific vocabulary.
Did you get stuck at some point?
That's the exact spot where you don't actually understand it yet.
This is one of my favorite tools for finding the holes in my own thinking.
Below, I re-share the full collection of such tools.
(1/11)
Been working with execs on AI rollouts for the last 14 months.
This is the most common level-by-level progression I see:*
Level 1: Run company wide AI audit, which includes mapping key processes, interviewing SLT/ELT, and surveying rank-and-file employees.
Level 2: Finalize post-diagnostic read out, which lays out AI transformation timeline by initiative. Prioritized by ROI, risk profile, and cultural support.
Level 3: Company realizes their data ducks aren’t in a row, which leads them to working on the “Company Brain” panacea.
Level 4: In tandem, start by investing in coding agents for the engineering/product org.
Level 5: Get enterprise access to general purpose LLM for subset of non-technical AI champions.
Level 6: Expand enterprise access to all non-technical employees.
Level 7: Run small cohort of workshops for ELT and AI champions so they get the most out of the technology.
Level 8: Roll out training/enablement program company-wide.
Level 9: Company runs AI hackathon, which lets employees bubble up solutions to problems from the AI audit as well as new problems.
Level 10: Leadership reviews and prioritizes employee hacks and decides which initiatives to take from prototype to production.
Level 11: First AI build tackled is typically quick win to some back-office process with attributable hard ROI where cultural pushback is expected to be low.
Level 12: Token efficiency/cost optimization becomes major focus as AI budget begins to balloon in eng org.
Level 13: As internal momentum builds & longer ROI leash is given, cycle of problem identification —> process map —> prototype —> test/harden/secure/measure —> scale is followed leveraging initial AI readout, hackathon findings, etc.
Level 14: Company starts moving further along the spectrum from deterministic workflow to self-guided agent as the AI muscle expands.
*Note: these levels can appear in different order or happen simultaneously vs. sequentially depending on the company’s context
For WWDC, we worked with Apple to run Kimi K2.6, a 1T-parameter model, across a cluster of four Mac Studios using a preview version of LM Studio.
We showcased secure remote access from a MacBook Neo and iPhone using LM Link.
A glimpse of your own private, frontier-scale AI.
When all these closed labs decide it's time to rug pull everyone, you all are gonna regret not Buying a GPU
Owning your compute allows you to be in control, even if partially
Not too late yet
Local LLMs are the Great Leap Forward for Inference. Every laptop is it's own datacenter, sovereignty over your own tokens, and the people can seize the means of token generation. And that's why it's destined for poor results. (1/4)🧵
One of my personal favorite features announced at WWDC will I suspect be a sleeper hit: container machines, allowing your Mac to run a lightweight, persistent Linux environment with your home directory and repos automatically mounted: https://t.co/dOBdfOOVxC
Three MLX videos dropped at WWDC:
Running agents locally by @angeloskath https://t.co/heFVMy1feB
Distributed inference and training by Tatiana Likhomanenko https://t.co/ZzxZ5fIVRO
MLX Swift by David Koski https://t.co/67h4VGlAeA
A French engineer who lives quietly in Paris has spent 30 years writing software that the entire internet now runs on without knowing his name.
He wrote the code that streams every YouTube video, every Netflix show, every TikTok clip. He wrote the code that runs the virtual servers underneath AWS, Google Cloud, and Microsoft Azure. He calculated more digits of pi than anyone in history. He has no Twitter. He has no marketing. He just keeps shipping.
His name is Fabrice Bellard.
Here is the story, because almost nobody outside the systems programming world knows what one man has built.
Fabrice was born in 1972 in Grenoble, France. He studied at École Polytechnique, the top French engineering school. He never went to Silicon Valley. He never built a startup empire. He just wrote code.
In 2000 he started a project called FFmpeg, an open-source multimedia framework for encoding, decoding, and streaming video. He was 28. The project did one thing nobody else had done well. It handled every video and audio format that existed, in one library, on every operating system. He led it himself for years.
Today FFmpeg is the invisible engine of the internet. YouTube uses it. Netflix uses it. VLC uses it. Chrome and Firefox use parts of it. Every Android phone, every iPhone, every smart TV, every video editing tool you have ever touched runs FFmpeg somewhere underneath. If you have watched a video on a screen in the last 20 years, Fabrice's code processed it.
He was not done.
In 2003 he started QEMU, a machine emulator and virtualizer. He wrote it solo until version 0.7.1 in 2005. QEMU lets you run any operating system on any other operating system. It became the foundation of modern virtualization. KVM, the Linux kernel hypervisor, runs on top of QEMU. Every major cloud provider, AWS, Google Cloud, Microsoft Azure, IBM Cloud, runs virtual machines on infrastructure built around it. The Quick Emulator is the most cited piece of cloud infrastructure code on Earth.
He kept going.
In 2001 he won the International Obfuscated C Code Contest with a small C compiler that grew into TCC, the Tiny C Compiler. TCC can compile and boot a Linux kernel from source in under 15 seconds. In 2004 he calculated the most digits of pi ever computed at the time, using a personal desktop computer and an algorithm he derived himself called Bellard's formula. In 2011 he wrote a complete PC emulator in pure JavaScript that runs Linux in your browser, a project called JSLinux that engineers still cannot believe is real.
In 2019 he released QuickJS, a small but complete JavaScript engine that fits where V8 cannot. In 2021 he released NNCP, a neural network based lossless data compressor that immediately took the lead on the Large Text Compression Benchmark.
Then he turned his attention to large language models. He built TextSynth Server, a web server with a REST API for running LLMs locally. He released ts_zip and ts_sms, compression utilities that use language models to compress text and short messages at ratios traditional algorithms cannot reach. He released TSAC, a very low bitrate audio compression system. In December 2025 he released Micro QuickJS, a new JavaScript engine for microcontrollers, separate from QuickJS, designed for environments with almost no memory.
Fabrice co-founded a telecom company called Amarisoft in 2012, where he serves as CTO. Amarisoft builds 4G and 5G base station software used by carriers and labs around the world. He has been running it for over a decade while continuing to ship personal projects from his own home page at bellard dot org
He has no Twitter. He has no Instagram. He gives almost no interviews. His personal website is a flat list of projects with no styling, no fonts, no marketing copy. Just titles and links.
A quiet French engineer who never moved to Silicon Valley wrote the code that quietly runs the internet.
He is still shipping.
Anthropic's main manager:
"Nobody types prompts from scratch. The commands should be live in the project."
In 26 minutes, she walks through how Anthropic runs Claude Code, including the command library every new dev inherits on day one.
Watch the full talk, then save the config below👇