Aananth V @aananth_ - Twitter Profile

12 days ago

@DimitrisPapail Awesome article! I wonder why this doesn't apply to SFT tasks like instruction tuning where an "assistant-only" loss seems to be the recommended practice? Should the LLM be learning to model/predict the user's responses and feedback?

0

6

Aananth V @aananth_

21 days ago

You're stopping training just before your model starts grokking

0

2

0

123

Aananth V @aananth_

22 days ago

Does anyone still use a single CC session? I'm usually context switching between ~4-5, so usually never look at the spinner :)

Andrew McCalip

@andrewmccalip

23 days ago

Get paid to wait The Claude Code spinner might be the most watched line on Earth. So I turned it into an ad marketplace. Advertisers bid on it. You keep 50% of the money. Install the extension → get cash from ads. Introducing Kickbacks

1K

13K

509

10K

8M

0

31

Aananth V @aananth_

24 days ago

The diffusion model is fine tuned from a Gemma4 27b model??! Might get nerdsniped today trying to implement it from a smaller model.

Google Gemma

@googlegemma

24 days ago

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

166

5K

809

2K

962K

0

63

Who to follow

Abeshek Narayan

@abeshek452000

Software Engineer @Google. You can just do things.

Anjaneya Tripathi

@AnjaneyaTripat1

learning one token at a time

Aditya Rana

@ranaaditya03

next-gen ai infra | co-founded @thedharmiklife

Aananth V @aananth_

24 days ago

The most interesting thing about this is that they are using the same weights for encoding and generation? Looks like a franken-transformer with the Encoder being a transformer decoder and vice versa. Might be the biggest architecture update we've gotten in a while

Google Gemma

@googlegemma

24 days ago

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

166

5K

809

2K

962K

0

1

0

32

Aananth V @aananth_

24 days ago

@osanseviero Feels like a natural first step towards a JEPA-like architecture?

0

214

Aananth V @aananth_

about 1 month ago

Writing LLM kernels is hard, but the dopamine reward is wild

0

1

0

14

Aananth V @aananth_

about 1 month ago

@willccbb The definition of "aren't very good" also changes when AI is significantly better than even the best humans.

0

262

aananth_ retweeted

Richard Sutton

@RichardSSutton

about 2 months ago

The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.

138

7K

981

3K

595K

Aananth V @aananth_

4 months ago

1162 today! :)

Aananth V @aananth_

4 months ago

Gave this another shot this week Went ahead and wrote a "compiler" with a "scheduler" myself -> ~2000 cycles out of the box Then used AI to write algorithmic optimizations -> 1256 cycles in no time! (Beats claude 4.5 at 1363)

2

3

0

158

1

3

0

69

Aananth V @aananth_

4 months ago

Quite well packed, but there are still some obvious improvements :)

0

35

Aananth V @aananth_

4 months ago

Gave this another shot this week Went ahead and wrote a "compiler" with a "scheduler" myself -> ~2000 cycles out of the box Then used AI to write algorithmic optimizations -> 1256 cycles in no time! (Beats claude 4.5 at 1363)

Anthropic

@AnthropicAI

5 months ago

New on the Anthropic Engineering Blog: We give prospective performance engineering candidates a notoriously difficult take-home exam. It worked well—until Opus 4.5 beat it. Here's how we designed (and redesigned) it: https://t.co/3RZVyhpVij

82

2K

222

2K

952K

2

3

0

158

Aananth V @aananth_

4 months ago

From my side, I probably need to improve my prompting as well? I did initially try to make AI implement the "compiler" in an unmodified repo, but it simply wasn't able to create something bug free It did take me ~4 hours to write, so I guess it is quite long on METR scales?

0

36

Aananth V @aananth_

4 months ago

Last time at ~1460 cycles I realised that we need to implement some kind of dynamic instruction scheduling.. but AI was unable to do it without errors even after several errors since the file was a vibecoded mess

1

0

30

Aananth V @aananth_

4 months ago

But even for this, we need AI to become better than humans at AI research Very catch 22

0

14

Aananth V @aananth_

4 months ago

Can someone explain why the industry seems to believe that RSI will happen before we reach ASI? If humans are unable to build something of human level intelligence, why do we expect a lesser intelligence to improve itself?

Andrej Karpathy

@karpathy

4 months ago

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://t.co/YCvOwwjOzF Part code, part sci-fi, and a pinch of psychosis :)

karpathy's tweet photo. I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:

- the human iterates on the prompt (.md)
- the AI agent iterates on the training code (.py)

The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc.

https://t.co/YCvOwwjOzF
Part code, part sci-fi, and a pinch of psychosis :)

1K

28K

4K

39K

11M

1

0

26

Aananth V @aananth_

4 months ago

The one reasonable explanation I can think of is that we don't need an AI to be "smarter" than a human in every category to start RSI. It only needs to be better at one specific thing: writing and optimizing AI code.

1

0

14

Aananth V @aananth_

4 months ago

@anujg https://t.co/YOmLrjwfNt

0

103

Aananth V

@aananth_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users