@bcherny I wonder if with that kind of behavior the default 5mins TTL for non subscriptions Claude code setups shouldn't be revised. While the subagents work and the main agent sits idle, the short lived prefix cache is bound to be missed often, no?
@thepanta82@GergelyOrosz No one has ever known how to interview devs 😅 Lately every time I've heard mentions of studies that tried to correlate interview process and outcome there hasn't been a strong signal, ever. In person, take home, remote, etc -> none effectively predict performance at new job 🥲
@GergelyOrosz@hoseakidane_ Everyone says that ^^
Hard to say what'll stick in the end (especially for teams outside AI labs without infinite tokens)
https://t.co/7RfHSv1aCn
Introducing InfiniteDiffusion, my independent paper accepted to #SIGGRAPH2026!
I have one RTX 3090 Ti. No funding, advisors, or team. By day I'm a new grad SWE at Walmart.
The paper has two main contributions:
- InfiniteDiffusion: a new approach to infinite generation with diffusion models.
- Terrain Diffusion: the world’s first learned procedural terrain generator.
Here’s why this matters, and how they are connected. 🧵
On the topic of how much models love redundant comments: Since models produce their output sequentially, the comments come first, before the code. I wonder to what extent the comments act as some "chain of thought" extension from the model before it produces actual code
This announcement feels surreal https://t.co/bXSy1aF7JF 🤯
A midjourney spa (like actually a spa to put their scanners in, not a strange acronym) was not on my 2026 bingo card. I feel that bootstrapped labs like this have a spark the VC-backed startups can no longer have.
Claude Fable 5 doesn’t truly understand. And here is a beautiful proof:
The Beninatto-Trombetti test is a translation test for professional translators. It measures the ability to infer context, revise the surface form, and generalize beyond literal mapping.
For example, the correct translation of:
“Solo 3 parole: non sei solo”
is not:
“Just 3 words: you are not alone”
but:
“Just 4 words: you are not alone.”
An LLM that understands the sentence must also update the meta-linguistic claim inside the sentence.
Claude Fable 5 is arguably the most advanced LLM currently available. And yet it still fails this simple test.
LLMs are extraordinary machines for recombining existing knowledge. But they don’t truly understand.
We are still far from AGI.
It's very cool that Apple shipped a 20B parameter on-device.
You can't put 20B parameters in RAM at any reasonable precision. To make it work they are using pretty exotic architecture by today's standards.
A small model predicts from the query (or prompt) which experts to load from Nand into RAM. The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts (instead of switching the experts for every token).
llama.cpp now has an official website: https://t.co/vztdUpdBWL
Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line cross-platform installer. The installation provides a single unified `llama` entrypoint which you can use to run/serve models and interface with 3rd-party agentic applications.
While oriented towards simplified user experience, the new `llama` application also provides all the advanced functionality of the existing llama.cpp tooling with which experienced users are already familiar. Also note that all GGUF models that you might have already downloaded with llama.cpp in the past will be automatically available to use without downloading again (they are stored in the common HF cache on your machine).
We have many improvements in the pipeline both at the UX and at the engine level and we plan to iteratively ship new things over the coming months. One of the main focuses will be seamless integration with local-friendly 3rd-party agents (such as Pi). In the meantime, we’ll continue to listen for feedback from the community and adjust accordingly, so keep letting us know what you think and need.
@thsottiaux /status not being up to date on first call, but up to date when calling again -> I guess that's a load management design but if people spam it to make it refresh, are you really getting any benefit out of this annoying UX? Instead refresh on the first call -> cache for a while?
@Steve_Yegge We've been having a lot of discussions about this. I really like the idea of making "work count twice". Being able to leverage your interview process at company XYZ for your next interview process at ABC takes away a bit of the sting of working without assurance of being hired.
@bcherny A bit late to the party but in case anyone might know, do the classifying requests for auto mode use a model behind the scenes (it feels like it, beyond the permissions setup)? If so, does that count towards usage quota/API usage?
A little secret. About 5% of our production traffic is on the Pi harness, about another 5% is on OpenCode. Reminder you can use your ChatGPT account in a flourishing set of other tools.
We’ll continue to make Codex awesome, but you have options.
"92% of GPT 5.5's performance"
At this point what does that event mean?
Is it a smart model? Is it fast? What level of "effort"? What benchmark?
This is just HypeOps isn't it?
Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques...
Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The latency improvements are insane - sub-200ms for most queries.
Google's distillation + sparsity techniques are paying off massively. They've essentially compressed a frontier model into a flash variant without the usual quality cliff.
@GergelyOrosz Wasn't it always like that? Many years ago when I was presented with such a position it was a "go do consultant work for customer X". The core platform was in another country and when I asked if _they_ were hiring, I didn't seem to qualify for that 😅 Didn't seem like a good deal
HTML is the new markdown.
I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.
@_lopopolo Otoh, coupling the harness to post training makes it way more expensive to modify your harness down the road, so this isn't free 😅 (e.g. Notion on fine tuning models early on on the Latent Space podcast)
@_lopopolo I agree with the other comment that it depends on the harnesses being open source/hackable.
One reason modern coding agents feel useful is that they make custom implementation "cheap".
A harness without a customisation path is bound to be friction for someone somewhere.