@DimitrisPapail Awesome article! I wonder why this doesn't apply to SFT tasks like instruction tuning where an "assistant-only" loss seems to be the recommended practice?
Should the LLM be learning to model/predict the user's responses and feedback?
Get paid to wait
The Claude Code spinner might be the most watched line on Earth.
So I turned it into an ad marketplace.
Advertisers bid on it. You keep 50% of the money.
Install the extension → get cash from ads.
Introducing Kickbacks
Meet DiffusionGemma!
An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.
Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
The most interesting thing about this is that they are using the same weights for encoding and generation?
Looks like a franken-transformer with the Encoder being a transformer decoder and vice versa.
Might be the biggest architecture update we've gotten in a while
Meet DiffusionGemma!
An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.
Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
The bitter lesson in 26 words:
Don’t be distracted by human knowledge, as AI has been historically.
Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Gave this another shot this week
Went ahead and wrote a "compiler" with a "scheduler" myself -> ~2000 cycles out of the box
Then used AI to write algorithmic optimizations -> 1256 cycles in no time! (Beats claude 4.5 at 1363)
Gave this another shot this week
Went ahead and wrote a "compiler" with a "scheduler" myself -> ~2000 cycles out of the box
Then used AI to write algorithmic optimizations -> 1256 cycles in no time! (Beats claude 4.5 at 1363)
New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it.
Here's how we designed (and redesigned) it: https://t.co/3RZVyhpVij
From my side, I probably need to improve my prompting as well? I did initially try to make AI implement the "compiler" in an unmodified repo, but it simply wasn't able to create something bug free
It did take me ~4 hours to write, so I guess it is quite long on METR scales?
Last time at ~1460 cycles I realised that we need to implement some kind of dynamic instruction scheduling.. but AI was unable to do it without errors even after several errors since the file was a vibecoded mess
Can someone explain why the industry seems to believe that RSI will happen before we reach ASI?
If humans are unable to build something of human level intelligence, why do we expect a lesser intelligence to improve itself?
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:
- the human iterates on the prompt (.md)
- the AI agent iterates on the training code (.py)
The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc.
https://t.co/YCvOwwjOzF
Part code, part sci-fi, and a pinch of psychosis :)
The one reasonable explanation I can think of is that we don't need an AI to be "smarter" than a human in every category to start RSI. It only needs to be better at one specific thing: writing and optimizing AI code.