As with fat tails Llms are frequency machines that fail to extrapolate outside the sample set. What they know is the VISIBLE.
Almost as bad as economists, almost worse than psychologists.
Each time we release a model, we run the same test: give it code that trains a small AI model, ask the new model to speed it up. It takes a skilled human 4-8 hours to reach 4x faster.
In May 2024, Claude Opus 4 averaged a ~3x speedup. This April, Mythos Preview achieved ~52x.
Wish I found this one, for you guys before today!
If you're not a pure mathematician and you're struggling with functional analysis, check out this masterpiece titled ''An Introduction to Functional Analysis for Science and Engineering'' by David Miller (Stanford University).
If you're looking for a non formal primer, this is it. I'll keep it brief, go and check it out!
🔗👇
Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models.
Saves us millions of $ and we're actually seeing an *increase* in performance on many core use cases. Transformative for the business.
I want to do small experiments of ceramics structures with textiles enclosing, textiles only spaces, weaving roofs, and combinations of all of them.
My biggest idea and possible culmination to this project will be to knit a human scale house.
In the meanwhile, I’ve been knitting, embroidering and weaving baskets because I love the idea of taking a thread and making it into something usable and three dimensional.
So I started thinking, what do these things have in common?
This may not be broadly known, but if instead of causal attention
yᵢ = xᵢ + attn(norm(x))
you do causal EMA
yᵢ = xᵢ + α ∑ⱼ βⁱ⁻ʲxⱼ
where α, β are fixed scalars, eg α=0.1, β=0.9,
it still works — with a healthy loss curve that converges to a non trivial value.
Recently met @srush_nlp and he started giving me an impromptu lecture on how targeted on-policy self-distillation works.
I asked him if I could record it on my iPhone.
The basic idea is this: if the model made a mistake at some point in the rollout (for example, calling a tool that doesn't exist), we want to discourage this specific error, but we don't want to just learn from the final reward, because it's a very noisy signal spread out over the whole trajectory.
So we have another model read this trajectory and figure where the error was made. It simply inserts some hint tokens to the part of the trajectory right above where the mistake was made.
Now with these injected hint tokens, have the model run a forward pass. You're not having to regenerate a new rollout - aka no new decode required.
The hint causes the model to assign lower probabilities to the error tokens. You then trains the original model to match these new probabilities, teaching it to downweight that specific mistake.
Hi. Over the last 24 hours we had three separate small incidents that affected Codex reliability. Those are three too many and we are taking active steps for them to not reproduce.
I have reset usage limits for Codex across all paid plans. May the tokens flow again.
One of the most heroic things I've seen recently is one little town in northern Michigan that kept a bird from going extinct.
The town is Mio, population of about 1800. The bird is the Kirtland's warbler, a small gray-and-yellow songbird that breeds in exactly one kind of habitat, mostly in a single corner of Michigan.
In 1974, the entire global population dropped to 167 singing males. The bird was one of the first species listed under the original 1966 Endangered Species Preservation Act, and it looked like the species was going to be extinct within a generation.
The problem was the habitat. Kirtland's warblers need fire-disturbed jack pine. Their entire breeding range is one specific successional stage of a fire-adapted forest. Decades of fire suppression had let the jack pine grow up past the age the birds could use. The birds had nowhere left to nest.
Mio became the staging point for the recovery. They built a forest management program: clear-cut, replant, burn, repeat. About 76,000 hectares are now managed on roughly six-year rotations to keep a continuous supply of young pine in the bird's preferred age range.
The work has paid off with the total population estimated at over 4,500 birds. The Kirtland's warbler was removed from the endangered species list in 2019, a rare full delisting.
The bird still requires active management. If the work stopped, the jack pine would age out within 20 years and the species would collapse again.