AAAAAAAAAA @aaaaaaaaaaorg - Twitter Profile

about 2 months ago

introducing cloudy, a platform where ai researchers can describe an experiment that they want to run, and cloudy handles everything else - generating code, the gpu & storage infrastructure, retries, and logs. more details below:

naklecha's tweet photo. introducing cloudy, a platform where ai researchers can describe an experiment that they want to run, and cloudy handles everything else - generating code, the gpu & storage infrastructure, retries, and logs. more details below: https://t.co/lrk9k70FVy

5

64

8

22

4K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

3 months ago

i’m building something new for teams that want to automate 100s of tiny research experiments on gpus (think @karpathy’s auto research). if you want to try out an early version, reach out to me!

2

22

1

3

2K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

5 months ago

for people who run experiments on gpus: i built a cli tool that gives coding agents access to cloudy's infra (~2x cheaper than other sandbox / serverless clouds). with this cli, you can ask claude to: "finetune kimi k2 on 64 h100s & to save money test your finetune on 8 h100s"

5

68

8

40

8K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

6 months ago

introducing simple-llm: a ~950 line, powerful & extensible inference engine that performs on par with vllm. enjoy :) performance (gpt-oss-120b, on an h100): - batch=1: 135 tok/s (vllm: 138) - batch=64: 4,041 tok/s (vllm: 3,846) https://t.co/x0aLi4rcAP

24

768

74

613

68K

aaaaaaaaaaorg retweeted

Jacob Rintamaki

@jacobrintamaki

6 months ago

THE FINAL OFFSHORING If you try to look to the past to understand the future of automation, the future is bleak. On one hand, if we don’t go forward with robotics, then we’ll likely enter a stagnant, malthusian hellscape of a world. But on the other hand, if we do go forward with robotics, then society might not survive the transition intact. If these are the two default futures, then it's no wonder so many people are pessimistic. Is there an option to just have neither of these, please? Yes, actually. But it won’t be easy. It won’t be simple. And it will require reading a lot of footnotes.

jacobrintamaki's tweet photo. THE FINAL OFFSHORING
If you try to look to the past to understand the future of automation, the future is bleak.

On one hand, if we don’t go forward with robotics, then we’ll likely enter a stagnant, malthusian hellscape of a world.

But on the other hand, if we do go forward with robotics, then society might not survive the transition intact.

If these are the two default futures, then it's no wonder so many people are pessimistic. Is there an option to just have neither of these, please?

Yes, actually. But it won’t be easy. It won’t be simple. And it will require reading a lot of footnotes.

50

525

45

451

178K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

6 months ago

saturday night random idea: i'm vibe reimplementing vllm into a single & simple inference file + a folder with custom kernels. rules: - allowed libs: torch, numpy, flash-attn - must match vllm's gpt-oss-120b inference speed - can optimise for the model and hardware (h100)

9

191

6

66

12K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

6 months ago

2026, ????

10

727

35

711

71K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

7 months ago

Introducing "public volumes" on Cloudy - with this update, open-source projects can share as reproducible sandboxes that can be forked and mounted on GPU instances within seconds. Share, fork & mount 100TB+ sandboxes seamlessly with Cloudy. Here is a demo:

1

29

4

6

6K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

7 months ago

i miss working on projects like these... we are working on something interesting @cloudysoftwares that will hopefully incentivise more projects like these on the timeline :')

0

57

4

14

6K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

9 months ago

Announcing Cloudy, a platform that seamlessly handles your training infrastructure. You can rent a single H100 or a cluster of 1000 H100s, manage petabyte-scale storage volumes & seamlessly go from running experiments to managing large scale training runs on a single interface.

25

259

17

150

33K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

9 months ago

today, i'm shutting down @aaaaaaaaaaorg. although i deeply believe in a10's mission, the only right way to build a10 would be to stay open, focus on education & stay non-profit. it will make a comeback someday, self-funded. until then, i'm focused on making fuck you money.

9

93

2

11

7K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

about 1 year ago

i'm working on an open-source repo that teaches people llm inference optimisations. rn the fastest files on the repo are at 6.5-7k tps, as compared to vllm's 10k for the same batch size. i added a c++ implementation today. feel free to try it out (wip): https://t.co/Sb4Nq3HDS7

naklecha's tweet photo. i'm working on an open-source repo that teaches people llm inference optimisations. rn the fastest files on the repo are at 6.5-7k tps, as compared to vllm's 10k for the same batch size. i added a c++ implementation today. feel free to try it out (wip):
https://t.co/Sb4Nq3HDS7 https://t.co/drlJ2dRzFM

6

222

10

126

15K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

about 1 year ago

i'm working on a repository that implements increasingly complex llm inference optimizations in zero abstraction single file implementations. to start with, i put up 4 files on a repository that implements llama3.2 1b at 15tps, 60tps, 180tps, 1385tps on my 4090. the project will end when i am close to vllm's performance (7218 tps on my testset). then, i will do the same thing again with multiple gpus and implement parallel inference (maybe on a larger model). once i'm done with that, i'll write a blog / make a video explaining it all. i have a few more working optimisations and a few buggy ones that i'll fix and push to the repository this week. i wanted to get the ball rolling & set up the repository for you guys to check out. feedback/thoughts are welcome :)

6

128

3

51

10K

aaaaaaaaaaorg retweeted

Pramod Goyal

@goyal__pramod

over 1 year ago

Llama 3 implementation each matrix multiplication explained 1 by 1. One of the greatest resources to learn about LLMs from a very low level.

goyal__pramod's tweet photo. Llama 3 implementation each matrix multiplication explained 1 by 1.

One of the greatest resources to learn about LLMs from a very low level. https://t.co/X5CkmaeenD

4

280

36

263

15K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

over 1 year ago

i love writing!! it gives me an excuse to learn topics and is my small way of giving back to the community (inspired by @karpathy's videos). my past blogs: 1. a blog on lcms (10k readers) 2. llama3 from scratch (14k github stars, ~100k readers) 3. an rl guide (20k readers)

0

17

2

4

1K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

over 1 year ago

for my next blog post -- i am working on a comprehensive guide on data parallelism. the guide will explain relevant parallelization tricks, training on thousands of h100s, networking within gpus and nodes, some relevant cuda concepts & a bunch of pytorch code :) eta ~2 months.

naklecha's tweet photo. for my next blog post -- i am working on a comprehensive guide on data parallelism. the guide will explain relevant parallelization tricks, training on thousands of h100s, networking within gpus and nodes, some relevant cuda concepts & a bunch of pytorch code :) eta ~2 months. https://t.co/l5zhGFV1A2

4

104

7

40

6K

aaaaaaaaaaorg retweeted

Antaripa Saha

@doesdatmaksense

over 1 year ago

with deepseek's r1 release, now is the best time to learn about reinforcement learning @naklecha already dropped a bomb guide covering almost everything for someone to get started with RL. crazy thing is it covers both theory plus code examples

doesdatmaksense's tweet photo. with deepseek's r1 release, now is the best time to learn about reinforcement learning

@naklecha already dropped a bomb guide covering almost everything for someone to get started with RL. crazy thing is it covers both theory plus code examples https://t.co/0R5j7htsTk

8

1K

100

1K

279K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

over 1 year ago

notes on my reinforcement learning for rocket league side quest so far: i’m in awe that — we live in a time where i can train a reinforcement learning model that is learning to play rocket league and at the same time, test community made rocket league models. where, both processes can run in parallel & locally on my hardware. in the game window — i’m playing against a reinforcement learning ppo model that’s running locally. and in the vscode window — i’m training my own reinforcement learning model (just a dummy example script for now). also, the rocket league botting ecosystem has incredible tooling where you can test out different reinforcement learning ideas against other algorithms + it has great documentation for linux and windows. it’s honestly a beautiful hidden gem of a rabbit hole :)

17

482

33

234

47K

aaaaaaaaaaorg retweeted

naklecha

@naklecha

over 1 year ago

my new ml side quest -- i'm going to build rocket league bots, using reinforcement learning. my goal is to train a bot (in 1v1s) that is better than ~95% of rocket league's player base (diamond/champ rank).

naklecha's tweet photo. my new ml side quest -- i'm going to build rocket league bots, using reinforcement learning. my goal is to train a bot (in 1v1s) that is better than ~95% of rocket league's player base (diamond/champ rank). https://t.co/8bholYljHs

7

140

4

19

47K

AAAAAAAAAA

@aaaaaaaaaaorg

Last Seen Users on Sotwe

Trends for you

Most Popular Users