Stephen Panaro @flat - Twitter Profile

Pinned Tweet

Stephen Panaro @flat

over 10 years ago

“We won’t run it in digital because we’re purists and maniacs.”

2

5

0

Stephen Panaro @flat

3 months ago

Excellent rundown of the current state of LLM quantization.

Artur Chakhvadze

@norpadon

3 months ago

We are doing really cool hard tech at @trymirai, but until recently our social media feeds were full of linkedinish cringe. We decided to fix it and share more technial content I am currently working on our quantization pipeline, so here is a thread about LLM quantization

3

60

7

43

5K

0

1

0

1

435

Stephen Panaro @flat

3 months ago

@mweinbach Thought it was a cleaner summary. Possible I’ve just never looked beyond that though 🤔

1

0

56

Stephen Panaro @flat

3 months ago

@mweinbach Huh weird. I also got it from Opus today too but had never seen it before. Assumed it was an Anthropic issue

1

0

60

Who to follow

Software engineer @google working on @GenkitFramework. Prev @Twitter & @GoldmanSachs. Avid snowboarder and @Yankees fan. @RPI alum. Opinions are my own.

Oleg Vaskevich

@ohleg

🇺🇦🇺🇸 engineering @superhuman @coda_hq. he/him

Stephen Panaro @flat

7 months ago

@mattcassinelli @tylerangert Oh for sure some cool non-MLX stuff this year. When Google released their on-device base model + LoRA, I crossed my fingers Apple would do the same. And they did! Am actually curious now if anyone has shipped an AFM LoRA.

0

42

Stephen Panaro @flat

7 months ago

@mattcassinelli @tylerangert Having been away from MLX for a bit, what did I miss? (Or is this about non-MLX like CoreML/Foundation Models?)

1

0

61

Stephen Panaro @flat

9 months ago

https://t.co/gpuCdSfrzu

0

1

0

771

Stephen Panaro @flat

9 months ago

This ButterflyQuant paper looks neat, but also a little sus: - no code - no comparison against its closest relative (SpinQuant) A good test project for coding agents?

2

0

1K

Stephen Panaro @flat

11 months ago

@anemll How’s the speed compare to float16 inputs?

1

0

149

Stephen Panaro @flat

11 months ago

@anemll

1

0

121

Stephen Panaro @flat

11 months ago

Incoming new coremltools looks like it has some nice bits: - 8 bit input/output tensors (previously all 8bit compute was kept internal) - >1 input can be enumerated shapes (👀ANE)

1

8

2

3

2K

Stephen Panaro @flat

11 months ago

https://t.co/i3sCue1Prt

0

3

0

357

Stephen Panaro @flat

11 months ago

Looks like it’s happening!

Stephen Panaro @flat

12 months ago

Wonder if we’re gonna get a new version of coremltools. Last year it dropped on Monday.

1

5

0

2K

2

5

0

1K

Stephen Panaro @flat

11 months ago

@AaronWeiHuang 👋 Any chance y’all plan to release code for DBellQuant?

0

25

Stephen Panaro @flat

11 months ago

@cloneofsimo Can use this for quantization too: https://t.co/6YjulZBg0J

0

33

Stephen Panaro @flat

12 months ago

🐙: https://t.co/M6F5EjzFrS 📄: https://t.co/aaPSq8iCmq (R₅ is a rotation matrix, so its transpose is its inverse and it naturally cancels out in [email protected])

0

1

0

3

224

Stephen Panaro @flat

12 months ago

Turns out you don’t need R₅⁻¹ at all. 🫠 Fusing into Q and K is enough! Cool paper from Qualcomm explains this and a few similar transforms. No code in the paper, so gist proof👇

Stephen Panaro @flat

over 1 year ago

Liking the line of research where you multiply LLM weights by rotation matrices and the model still works. Most do it in between layers, but you can also sneak one between Q/K and RoPE. Extra parameters? None. Useful? …Maybe. Cool? I think so. (See R₅ below.)

flat's tweet photo. Liking the line of research where you multiply LLM weights by rotation matrices and the model still works.

Most do it in between layers, but you can also sneak one between Q/K and RoPE.

Extra parameters? None.
Useful? …Maybe.
Cool? I think so.

(See R₅ below.) https://t.co/OYFtjZoRaO

1

5

0

1

1K

2

4

0

2

830

Stephen Panaro @flat

12 months ago

The python library is interesting too. “Download files”: https://t.co/swlL1C8AFf

0

1

0

246

Stephen Panaro @flat

12 months ago

Curious about the Apple Foundation Model architecture? I updated my netron fork to visualize the draft model*. *they say it might differ from the real model but looks convincing to me

1

10

2

3

2K

Stephen Panaro @flat

12 months ago

See for yourself: 1. Get the adapter training toolkit: https://t.co/FBoPMOyOFd 2. Clone: https://t.co/KBnvGSQh5I 3. Edit https://t.co/NwMlsQrn2a: - delete all functions except the first - rename it to: func main<ios18>( 4. Follow readme to start netron, and open the .mil

1

5

1

2

345

Stephen Panaro

@flat

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users