a.desi.penguin @_notapenguin - Twitter Profile

Pinned Tweet

a.desi.penguin

7 months ago

Loved these set of lectures by @a_kazemnejad and team on recreating the 'aha moment' from the deepseek paper, it was an absolute treat to watch these guys tackle the problems in real time (and I got to see them use vllm, which was fun). Gonna spend sometime this week to recreate this and share my journey as a blog post or a series of tweets :). It's always nice to spend sometime on things that aren't related to my work and just enjoy learning.

0

9

1

5

2K

a.desi.penguin

@_notapenguin

about 5 hours ago

@chickenalakiev dw, its only uphill from her-

0

32

a.desi.penguin

@_notapenguin

about 5 hours ago

@DevanshAr05 @original_ngv @Rchit_d ISTG YOUR BOSS IS THE REAL GAJINI

2

1

0

22

a.desi.penguin

@_notapenguin

about 5 hours ago

@original_ngv @DevanshAr05 @Rchit_d WHICH CLUB ARE YOU TALKING ABOUT?

1

0

32

a.desi.penguin

@_notapenguin

about 5 hours ago

@original_ngv @Rchit_d @DevanshAr05 YOU CALLED ME GAJINI, HOW DARE YOU 😭

0

1

0

10

a.desi.penguin

@_notapenguin

about 5 hours ago

@original_ngv @Rchit_d @DevanshAr05 Wait what, which club?

0

19

a.desi.penguin

@_notapenguin

about 5 hours ago

@original_ngv @IndiaToday yes

0

11

a.desi.penguin

@_notapenguin

about 6 hours ago

@original_ngv @IndiaToday potha le ra, enduku friend neeku

1

0

23

a.desi.penguin

@_notapenguin

about 14 hours ago

@akshitwt @kreepkroop Flex

0

21

a.desi.penguin

@_notapenguin

1 day ago

@sarthak2143 What's auto formalisation?

0

36

a.desi.penguin

@_notapenguin

2 days ago

In other words: do fucking everything

Jimmy Heaters

@CathPoaster

3 days ago

new grads often ask me what they should be doing so they don't fall behind in the ai space. there's a lot, but its honestly super manageable. become intimate with model internals. proof based linear algebra. non-convex optimization. this is stuff you could've done in undergrad. it definitely takes some time and work, but its doable. have taste, have opinions. train a small model, then train a big one. vLLM internals, tensor parallelism. hand roll kernels. cluster orchestration. do you have opinions on synthetic data? why don't you? SFT, PPO, you should know this. learn Triton. everyone is reproducing papers now so you need to be doing more. do you know the semi supply chain? where are the bottlenecks? hardware, man, hardware. your little gpu rig erector set in your basement isnt gonna cut it. build a cluster, a big one. pretrain a 800B model. now postrain it. serve it to millions of people. you should be able to beat deepseek on some benchmarks now. its a lot to take in but it all snowballs. this what job security looks like from now on. do you want to work in tech or not

101

4K

252

6K

730K

0

13

0

1

798

a.desi.penguin

@_notapenguin

4 days ago

@thsottiaux just general curiosity, paired with external sentiment (youtube, friends, public opinion). Tbh a lot of them feel the same to me so i go off based on curiosity and not the urgency to try everything new that comes out.

0

37

a.desi.penguin

@_notapenguin

5 days ago

@silicognition @apartresearch would love to join

0

36

a.desi.penguin

@_notapenguin

5 days ago

@original_ngv I know the number, and holy fuck i'd like to earn that much one day (maybe in 30-40 years)

1

0

168

a.desi.penguin

@_notapenguin

6 days ago

A more important Q: now that we have opus 4.8 how many more months till 6 months away from software engineering dying?

ThePrimeagen

@ThePrimeagen

6 days ago

Now that we have Claude 4.8 how long until 4.6 is called trash?

218

2K

18

27

115K

0

1

0

55

a.desi.penguin

@_notapenguin

6 days ago

Pro tip for anyone who wants to use LLMs for research - ground it. The problem is that the ideas are not bad, but that they're not grounded. Any idea that cannot be linked to existing work will sound like the person is talking in air to me. What I do everyday when trying to talk through my ideas is: 1. Ask it to critique my idea: What is good?, what is bad? What can we do to implement an MVP of this? 2. Ground it: Does this idea already exist? If so what are its flaws? What can we do to improve on existing work? Which area of work does this idea come from? And while you describe your idea to the LLM, be sure to be elaborate with the story of the idea: don't just say "I think X can be done, and I expect Y results" you can try: "Because I saw Z paper, I think X can be done and Y results can be expected" This will also help your LLM understand where your idea comes from and will sometimes do the survey without you telling it to (if you constantly use opus it does this a lot)

Mathieu

@miniapeur

7 days ago

I’m all for using AI in research. But in the last two months, I’ve had three people come up to me with “novel ideas”, and none of them made sense. When I asked how they came up with them, they all eventually admitted that AI had told them to do it.

19

188

13

11

14K

0

37

a.desi.penguin

@_notapenguin

6 days ago

It comes down to few things, mainly what I am seeing here is: 1. Costs of 0-1: The models keep getting bigger and bigger, so hosting them alone is costly. GPUs ain't cheap. 2. Token efficiency: Models these days love to reason a lot and pretty much do nothing (GPT models don't as much, but claude loves to think for long periods of time). This costs a lot of money to do inference. 3. Optimizations: what made deepseek special when it came out is not just because they were on par with the frontier labs (i am talking about R1) but also because of the way training was done. They released a suite of repos and papers (mHC, engram and so on) and also kernels that help reduce costs by some extent. You can argue that the US labs are also doing the same thing, but not to the same extent that deepseek is doing because they don't have access to the same type of infrastructure that openai and anthropic has. there are a lot of other factors, i am not able to recall them. hope this helps!

0

2

0

292

a.desi.penguin

@_notapenguin

6 days ago

@thsottiaux since you're feeling good today, can you bless us with a limits reset?

0

1

0

67

a.desi.penguin

@_notapenguin

6 days ago

I am gonna crash out a lil bit about codex's flickering here. tagging @reach_vb and @thsottiaux First of all, codex is awesome. Love what you guys are building! For some reason on windows terminals (I mainly use powershell and the terminal inside zed), codex loves to just flicker a LOT. And this doesnt happen all the time, but whenever I want to do split screen between the terminal and open another file on the side. I'd have to wait a good 2-3 minutes sometimes for it to stop flickering and restore it. Sometimes it doesnt even stop and I have to close the terminal, open another one and resukme chat.

ClaudeDevs

@ClaudeDevs

7 days ago

First for our new full-screen renderer (which should get rid of bugs like screen flickering), we’ve made a number of fixes for different environments and terminals. You can turn it with the command: /tui feedback We're working on making it the default in Claude Code soon.

28

1K

7

175

277K

0

50

a.desi.penguin

@_notapenguin

Last Seen Users on Sotwe

Trends for you

Most Popular Users