1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5
We gave 3 models the same prompt and compared one-shot outputs.
The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.
Which output do you like best?
GGUF: https://t.co/BMkxswdj5N
With agentic coding, complexity compounds in a mechanical way: unnecessary code ends up in the codebase, moves to the context window, degrades the model's reasoning abilities, leads to more unnecessary code (often to fix issues arising from the unnecessary code). It's exponential
Introducing GLM-5.2: Frontier Intelligence, Open Weights
- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1
Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb
From one Aruco marker I got the relative positions of my wrist and global cameras and of my robot's kinematics chain.
The Aruco is flat on the table so I can project the wrist camera's intrinsics on the table plane, and get an estimate of the wrist cam from global pixels only.
@tenderizzation Btw this is how billions of user interactions are processed every day and millions of people see recommendations where the retrieval is done by 2 sparse matrix-vector multiplications 😀😀
https://t.co/3MS8aGuzqX
Many people think any given ML project is 99% training.
In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training.
The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data.
Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.
Major drone industry shakeup as @ExynTech released @SkydioHQ level autonomy for a fraction of the price.
The industry wars are here.
Remember competition is great for pilots.
Anthropic Quant Andrej Karpathy:
"Most people use tools that they don't understand
- the ones who strip everything down to basics - end up faster than everyone else "
"the best code is the code anyone can read "
he couldn't fix a bug in 2 hours, so instead of googling - he rewrote the entire system from scratch
no frameworks. no dependencies. it ended up faster
that's the difference between using AI and understanding it
25-min masterclass - bookmark and watch
Depth Anything 3 now runs as pure C++/ggml (@ggml_org) . No Python, no PyTorch, no CUDA toolkit at inference, just one self-contained GGUF.
It's faster than PyTorch on CPU! and ties speed on GPU. The CPU win came from the last place..I'd have looked.
Quantized GGUF on @huggingface🤗
Shout out to @ggerganov for ggml (we are building a ggml-world!❤️) and to @ByteDanceOSS and Depth Anything 3 authors @bingyikang@jhliew91@donydchen !
Can a VLM see without a vision encoder?
We trained one for $100, inspired by Gemma 4 12B.
Latency on an M3 Pro MacBook:
112 ms -> 1.1 ms for the image path
30% lower end-to-end image+LLM
The architecture is just:
patchify the image -> linear projection with pos embeddings -> LLM
Writeup:
https://t.co/yt0IKzsF7O
LOCAL LLM GUIDE (June 2026)
Cheapest full build: 1× used RTX 3090 (24GB) + rest of PC ≈ $1000-1500
16GB all-rounder → Gemma 4-12B
32GB all-rounder → Qwen3.6-27B
Agents & tool use → Qwen3.6-27B
Deep reasoning → Nex-N2-Mini
Why TF does AI-optimised metal look like bone?
Ask AI to optimise a bracket for strength and weight and it hands back something that looks grown, not built.
The physics behind this is very cool:
•What the software does. You give it a bounding box, the loads, and the anchor points. It models stress through the block as thousands of tiny springs - finite element analysis - then reinforces the cells carrying load and dials down the ones sitting idle. Over hundreds of iterations only the load-bearing material survives. It’s called topology optimisation, and the output is the most efficient distribution of material for that exact load case.
•Why does force moves in arcs, not straight lines? Any sharp corner creates a stress concentration - a local spike that fails first. The path of least resistance through a loaded solid is always curved, following the principal stress trajectories. Like water finding its way downhill, force takes the smoothest route. Lay material on those arcs and you get maximum strength per gram. Which is why the result always curves.
•Why does this look natural? Because evolution runs the identical loop: it deposits bone along stress lines and dissolves it where it sits idle and wastes energy.
The shapes aren’t biological or mechanical. They’re optimal. Evolution converged on them because wasting energy gets you outcompeted.
Different processes, same selection pressure.
What’s cool is we proved that these shapes were optimal in 1904. We just couldn’t build them until 3D printing arrived.