Introducing paper replications for arXiv
Research codebases are notoriously hard to set up
We deploy autoresearch agents to ingest popular arXiv repos, resolve setup issues, and get the core claim running. You can now sort papers by ease of implementation ๐
Transformers keep their entire input history available and can look back into it for any fact, which suits retrieval but doesn't work well with a task the authors call state tracking.
Imagine a model playing a guessing game: after each guess it says "higher" or "lower," so it must carry along the range the hidden number could still be in and narrow that range with every guess.
That range is a running summary the model has to keep and revise as the conversation goes; each guess moves it one step forward; and each narrowed range is computed from the previous range plus the latest guess, so the values form a chain, every one depending on the last.
In this recent arXiv paper from Google, the authors show transformers handle this badly: a model gives contradictory hints in this game or could flips between the two meanings of "bank" mid-conversation.
The reason is in how the information flows in a transformer. As a transformer reads input, it holds a separate vector of numbers for every word at every layer of the network, and information only ever flows upward through the layers, never back downโso once the model computes the current range and parks it at some layer, the next range, which depends on that one, can only land at the same layer or deeper, never shallower.
Each step pushes the running summary one level up the stack, and since the stack has a fixed number of layers, a long enough conversation runs it off the top with nowhere left to put the next update.
The author argue that a proper state tracking needs recurrenceโfeeding a deep representation back down to a shallow one, as older recurrent networks didโand explain why this implicit recurrence beats making a model "think out loud" in extra tokens for routine bookkeeping.
https://t.co/uCmLHZxRDI
been thinking about why I don't feel comfortable with AI being a true personal assistant still, even though the models have gotten so much better. i realized a lot of the reason for me is actually git / version control.
using an agent that is 99.999999% accurate vs 97% accurate doesn't really matter to me when the errors are high impact. with code, i can yolo a PR & know that its will not be destructive to my existing codebase. there's no similar staging mechanism for my calendar, shopping cart, etc.
in theory you can create calendar event drafts, but very few of the other useful activities (esp spending money) have a draft / easy review paradigm. the solves here will most likely be on the infra / scaffolding level. once you start making a universal shopping cart draft, you're basically inventing a new web browser, or an agent-first wallet / credit card.
Continual learning is widely discussed right now, but mostly as improving on the job or avoiding catastrophic forgetting. But it has a different, difficult, and already urgent form:
Given nothing but a corpus of documents, how should AI systems develop expertise in a new, unfamiliar domain? We call this problem Machine Studying.
This system's key feature: direct SVG export.
What looks 3D is really just a 2D projection โ so you edit it straight in Illustrator (right).
The whole thing thinks in paths. Let it follow the brush, and it takes on any style.
Not a filter !! teaching a computer how humans draw.
ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator
Why does recursive reasoning, especially latent reasoning, actually work? The theory is still young, and even mechanistic explanations are limited.
We close part of this gap by showing that latent reasoning is secretly doing policy improvement. Each recursion pushes the model steadily toward the target.
Based on this view, we propose an algorithm that boosts learning and inference efficiency by up to 18x.
6/ Free Lunch: Speculative Decoding.
Because NextLat learns a latent dynamics, it unlocks ๐๐ฎ๐ฟ๐ถ๐ฎ๐ฏ๐น๐ฒ-๐น๐ฒ๐ป๐ด๐๐ต ๐๐ฝ๐ฒ๐ฐ๐๐น๐ฎ๐๐ถ๐๐ฒ ๐ฑ๐ฒ๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด: we can draft a flexible number of future tokens recursively in latent space.
๐ This accelerates inference by up to 3.3x, much faster than MTP-style drafting!
A strong predictor of who does extraordinary work is being terrible with everyday life; they get peopleโs names wrong, forget meetings, constantly lose things or donโt remember to eat because all their mental bandwidth is going elsewhere.
We met a researcher with 87000+ unread emails and a passport that expired the day before an AI conference he was speaking at, and a founder who had worn pretty much the same outfit every day for 10yrs because choosing took up way too much mental effort.
Maintaining a well run life is a part-time job hours-wise. For someone who is juggling a hard problem in their head 24-7, there is no spare capacity to run these background processes. Society treats this as a flaw to coach out (the command is โbe more presentโ), but this level of absorption is necessary to solve a problem nobody else has.
@notch@igelblork Nailed it.
For me the big trip was realizing that 3D APIs usually write q*v, but mean rotate(v) = q v q^-1. So i/j/k are 180ยฐ rotors there.
But Hamiltonโs multiplication is a 90ยฐ turn between perpendicular units, which makes ij=k and ijk=-1 way more intuitive.
built a tiny RL rock-stacking experiment
trained on 200k sims via PufferLib (@jsuarez)
observes rock geometry, mass, friction, velocity, stack contour, support intervals, roll torque, balance, etc
spent half of my PhD working on optimization research, I only published one negative result paper showing beating SGD with momentum is hard especially when noise dominated regime ๐คฃ
it's not done if it's not implemented
it's not done if the implementation is ugly
it's not done if it's not documented
it's not done if users can't discover it
it's not done if you can't market it
New personal blog!
(n, k)-Local Polynomial Simplicial Attention, which generalizes LLA & 2-simplicial attention as two orthogonal design axes that jointly shape the regression landscape.