driss guessous @drisspg - Twitter Profile

driss guessous @drisspg

about 6 hours ago

Okay that's enough codex for now-> at least now I can finally look at the generation step

0

148

driss guessous @drisspg

about 9 hours ago

I am trying to make ideogram usable on my spark; Problem 1. https://t.co/q2LRrDOcFJ Problem 2. Bitsandbytes is unbelievable slow

3

6

2

965

driss guessous @drisspg

about 7 hours ago

@gazorp5 Okay turns out it was not bnb but acutally safe-tensor using mmap by defualt and no option in 0.7 to break this without explicit clones.

0

2

0

11

driss guessous @drisspg

about 8 hours ago

@gazorp5 I have not even looked at the actual generation step yet. if you try run_inference.py it takes 85 (on my spark) seconds just to start gen. Alot of this is just low hanging fruit Im currently at 28.6s

1

3

0

54

driss guessous @drisspg

about 8 hours ago

@typedfemale @SkyLi0n https://t.co/sAVOTGfGDJ

0

4

1

0

196

driss guessous @drisspg

about 8 hours ago

@gaunernst Im just trying `run_inference.py` lols

1

2

0

237

driss guessous @drisspg

2 days ago

@cHHillee https://t.co/j0GeSh2UYH I have some vibe in here but works great. Export to .pftrace

0

5

0

5

428

driss guessous @drisspg

2 days ago

"The purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise." This really is a nice quote

Modal @modal

3 days ago

Reinforcement learning has exploded on Modal, and we've been cooking. Here's a review of lessons learned helping teams train at scale, the patterns we kept seeing, and an open-source library to get started with RL on Modal quickly.

2

266

27

199

95K

1

14

0

3

2K

driss guessous @drisspg

3 days ago

Lol it took 7 hours for it to find GemmUniversal and do some hyper parameter tuning. Can't you feel the AGI!!!

3

57

2

18

4K

driss guessous @drisspg

4 days ago

@_seemethere Dude you don’t even have a manager

3

10

0

624

driss guessous @drisspg

5 days ago

@tenderizzation Bruhhh this is a solved problem

3

6

0

714

driss guessous @drisspg

10 days ago

@KuterDinel Not serious I thought it was just a fun sentence

1

2

0

86

driss guessous @drisspg

11 days ago

To hell with big TMA long live ld/st

3

38

4

7

3K

driss guessous @drisspg

10 days ago

@gaunernst @snowclipsed

1

5

0

285

driss guessous @drisspg

11 days ago

@maharshii > it's unfortunate that torch scaled mm api does not provide a global scale dequantization argument Can you elaborate here? https://t.co/wa3EYcBZkk This does support global scales. We should probably expand a lil in the docs but here is gist: https://t.co/rrpwcAlZjh

1

6

1

4

519

driss guessous @drisspg

11 days ago

@tenderizzation

0

5

0

284

driss guessous @drisspg

12 days ago

@_seemethere @difficultyang Yeah big caveat as that when you first use it’s gunna suck, but if you stick with it and actually muck around with the system prompt +extensions you end up with something that feels very tailored to your preferences

1

0

87

driss guessous @drisspg

12 days ago

@difficultyang

0

1

0

1

199

drisspg retweeted

Han Guo

@HanGuo97

14 days ago

LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).

HanGuo97's tweet photo. LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels.

CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip.

Bonus: LLMs can write fast CODA kernels too (approaching SoLs). https://t.co/cOTeMUr4py

15

678

103

531

196K

driss guessous @drisspg

15 days ago

Omni looks really cool, everything else is so mehh

0

4

0

1

621

driss guessous

@drisspg

Last Seen Users on Sotwe

Trends for you

Most Popular Users