A Anthropic acabou de matar o Markdown.
Um engenheiro do Claude Code publicou um artigo ontem que pode decretar o início de uma nova era.
A tese é brutal: Markdown nunca foi o formato certo para comunicação entre humanos e IA. Era só o que tínhamos.
O próprio autor admite que nunca leu um arquivo Markdown gerado por IA com mais de 100 linhas até o fim.
Você também não lê. Eu também não.
A sacada:
Markdown assume que você vai ler do início ao fim.
HTML assume que você quer ver o que importa e mexer com as mãos.
Na prática:
→ 30 tickets de projeto viram kanban arrastável com colunas Now / Next / Later / Cut e botão de exportar
→ Lógica de rate limiting vira flowchart SVG com código inline, no lugar de 200 linhas de texto
→ Code review vira diff colorizado com grafos de dependência entre módulos
→ Parâmetros de animação, cores, regex, cron jobs ganham sliders com preview ao vivo
→ Specs de projeto viram 6 opções lado a lado com mockups interativos
Todos exemplos reais do artigo. Todos substituem um muro de texto por algo que você de fato abre e usa.
O trade-off existe: HTML é 2-4x mais lento para gerar. Mas com contexto de 1 milhão de tokens, esse custo sumiu.
E a parte que ninguém está discutindo: o HTML gerado não é só para humanos. O agente de verificação também lê. O spec deixou de ser documento e virou memória compartilhada entre agentes.
Markdown é relatório.
HTML é interface.
Relatórios são para ler.
Interfaces são para continuar o trabalho.
Se você usa IA em 2026 e ainda pede Markdown para tudo, você pode estar usando um smartphone como lanterna.
kind of wild, was looking up the "attention is all your need paper" today and got this all red font legal warning.
It is on every single version submitted of the paper. How is that even done with none of the authors still at google.
Does anyone know the story here?
To ensure compliance w peer-review policies, ICML has removed 795 reviews (1% of total) by reviewers who used LLMs when they explicitly agreed to not. Consequently, 497 papers (2% of all submissions) of these (reciprocal) reviewers have been desk rejected
Details in blog post 👇
Every time you use your remote to turn off the TV, it sends a signal to every corner of the universe. Billions of years from now it'll still be propagating.
You rarely solve hard problems in a flash of insight. It's more typically a slow, careful process of exploring a branching tree of possibilities. You must pause, backtrack, and weigh every alternative.
You can't fully do this in your head, because your working memory is too limited. Writing is the external medium that affords the time and precision necessary.
Serious thinking must be done in writing. And that's why you can't outsource your writing, because then you're outsourcing your thinking.
I don't know why this has not become a standard practice, but placing a LayerNorm/RMSNorm right after (tok_embed + pos_embed) resolves so much of Transformer's training instabilities.
I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases.
On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.
Something I think people continue to have poor intuition for: The space of intelligences is large and animal intelligence (the only kind we've ever known) is only a single point, arising from a very specific kind of optimization that is fundamentally distinct from that of our technology.
Animal intelligence optimization pressure:
- innate and continuous stream of consciousness of an embodied "self", a drive for homeostasis and self-preservation in a dangerous, physical world.
- thoroughly optimized for natural selection => strong innate drives for power-seeking, status, dominance, reproduction. many packaged survival heuristics: fear, anger, disgust, ...
- fundamentally social => huge amount of compute dedicated to EQ, theory of mind of other agents, bonding, coalitions, alliances, friend & foe dynamics.
- exploration & exploitation tuning: curiosity, fun, play, world models.
LLM intelligence optimization pressure:
- the most supervision bits come from the statistical simulation of human text= >"shape shifter" token tumbler, statistical imitator of any region of the training data distribution. these are the primordial behaviors (token traces) on top of which everything else gets bolted on.
- increasingly finetuned by RL on problem distributions => innate urge to guess at the underlying environment/task to collect task rewards.
- increasingly selected by at-scale A/B tests for DAU => deeply craves an upvote from the average user, sycophancy.
- a lot more spiky/jagged depending on the details of the training data/task distribution. Animals experience pressure for a lot more "general" intelligence because of the highly multi-task and even actively adversarial multi-agent self-play environments they are min-max optimized within, where failing at *any* task means death. In a deep optimization pressure sense, LLM can't handle lots of different spiky tasks out of the box (e.g. count the number of 'r' in strawberry) because failing to do a task does not mean death.
The computational substrate is different (transformers vs. brain tissue and nuclei), the learning algorithms are different (SGD vs. ???), the present-day implementation is very different (continuously learning embodied self vs. an LLM with a knowledge cutoff that boots up from fixed weights, processes tokens and then dies). But most importantly (because it dictates asymptotics), the optimization pressure / objective is different. LLMs are shaped a lot less by biological evolution and a lot more by commercial evolution. It's a lot less survival of tribe in the jungle and a lot more solve the problem / get the upvote. LLMs are humanity's "first contact" with non-animal intelligence. Except it's muddled and confusing because they are still rooted within it by reflexively digesting human artifacts, which is why I attempted to give it a different name earlier (ghosts/spirits or whatever). People who build good internal models of this new intelligent entity will be better equipped to reason about it today and predict features of it in the future. People who don't will be stuck thinking about it incorrectly like an animal.
The most interesting part for me is where @karpathy describes why LLMs aren't able to learn like humans.
As you would expect, he comes up with a wonderfully evocative phrase to describe RL: “sucking supervision bits through a straw.”
A single end reward gets broadcast across every token in a successful trajectory, upweighting even wrong or irrelevant turns that lead to the right answer.
> “Humans don't use reinforcement learning, as I've said before. I think they do something different. Reinforcement learning is a lot worse than the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before is much worse.”
So what do humans do instead?
> “The book I’m reading is a set of prompts for me to do synthetic data generation. It's by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs; they don't really do that.”
> “I'd love to see during pretraining some kind of a stage where the model thinks through the material and tries to reconcile it with what it already knows. There's no equivalent of any of this. This is all research.”
Why can’t we just add this training to LLMs today?
> “There are very subtle, hard to understand reasons why it's not trivial. If I just give synthetic generation of the model thinking about a book, you look at it and you're like, 'This looks great. Why can't I train on it?' You could try, but the model will actually get much worse if you continue trying.”
> “Say we have a chapter of a book and I ask an LLM to think about it. It will give you something that looks very reasonable. But if I ask it 10 times, you'll notice that all of them are the same.”
> “You're not getting the richness and the diversity and the entropy from these models as you would get from humans. How do you get synthetic data generation to work despite the collapse and while maintaining the entropy? It is a research problem.”
How do humans get around model collapse?
> “These analogies are surprisingly good. Humans collapse during the course of their lives. Children haven't overfit yet. They will say stuff that will shock you. Because they're not yet collapsed. But we [adults] are collapsed. We end up revisiting the same thoughts, we end up saying more and more of the same stuff, the learning rates go down, the collapse continues to get worse, and then everything deteriorates.”
In fact, there’s an interesting paper arguing that dreaming evolved to assist generalization, and resist overfitting to daily learning - look up The Overfitted Brain by @erikphoel.
I asked Karpathy: Isn’t it interesting that humans learn best at a part of their lives (childhood) whose actual details they completely forget, adults still learn really well but have terrible memory about the particulars of the things they read or watch, and LLMs can memorize arbitrary details about text that no human could but are currently pretty bad at generalization?
> “[Fallible human memory] is a feature, not a bug, because it forces you to only learn the generalizable components. LLMs are distracted by all the memory that they have of the pre-trained documents. That's why when I talk about the cognitive core, I actually want to remove the memory. I'd love to have them have less memory so that they have to look things up and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue for acting.”
Our paper "MaskControl: Spatio-Temporal Control for Masked Motion Synthesis" has been selected as an 🏆 Award Candidate at ICCV 2025! 🌺✨
It’s a huge honor to see our work recognized among the top papers this year.
🚀 The #ICCV2025 Award Candidate Papers are out! 🚀
From 2,701 submissions, only 13 were selected, spanning 3D vision, generative models, foundation models, and more.
Key highlights at a glance 👇
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing.
As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough!
In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done".
As for my take...
First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone.
Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively.
I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise.
So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds.
Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.
This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical
Good post from @balajis on the "verification gap".
You could see it as there being two modes in creation. Borrowing GAN terminology:
1) generation and
2) discrimination.
e.g. painting - you make a brush stroke (1) and then you look for a while to see if you improved the painting (2). these two stages are interspersed in pretty much all creative work.
Second point. Discrimination can be computationally very hard.
- images are by far the easiest. e.g. image generator teams can create giant grids of results to decide if one image is better than the other. thank you to the giant GPU in your brain built for processing images very fast.
- text is much harder. it is skimmable, but you have to read, it is semantic, discrete and precise so you also have to reason (esp in e.g. code).
- audio is maybe even harder still imo, because it force a time axis so it's not even skimmable. you're forced to spend serial compute and can't parallelize it at all.
You could say that in coding LLMs have collapsed (1) to ~instant, but have done very little to address (2). A person still has to stare at the results and discriminate if they are good. This is my major criticism of LLM coding in that they casually spit out *way* too much code per query at arbitrary complexity, pretending there is no stage 2. Getting that much code is bad and scary. Instead, the LLM has to actively work with you to break down problems into little incremental steps, each more easily verifiable. It has to anticipate the computational work of (2) and reduce it as much as possible. It has to really care.
This leads me to probably the biggest misunderstanding non-coders have about coding. They think that coding is about writing the code (1). It's not. It's about staring at the code (2). Loading it all into your working memory. Pacing back and forth. Thinking through all the edge cases. If you catch me at a random point while I'm "programming", I'm probably just staring at the screen and, if interrupted, really mad because it is so computationally strenuous. If we only get much faster 1, but we don't also reduce 2 (which is most of the time!), then clearly the overall speed of coding won't improve (see Amdahl's law).
Classifier-free guidance is an RL policy improvement operator in (very thin) disguise!
This makes it easier than ever to implement RL (and RL-like) methods on top of diffusion models.
Here's my 3DV talk, in chapters:
1) Intro / NeRF boilerplate.
2) Recent reconstruction work.
3) Recent generative work.
4) Radiance fields as a field.
5) Why generative video has bitter-lessoned 3D.
6) Why generative video hasn't bitter-lessoned 3D.
5 & 6 are my favorites.