mobicham @mobicham - Twitter Profile

8 days ago

@zhyncs42 Congrats! Would you guys share the right config to run Qwen3.6 models on B300s? Would be nice to have some recipes, I couldn't find enough info on Github

1

0

247

mobicham @mobicham

12 days ago

Can't wait for something like Claude Code for Ableton 👀, the time to manually eq and edit MIDI should be over

0

4

0

314

mobicham @mobicham

21 days ago

@EmrickSini It uses a vLLM plugin actually, that's why it works by just setting an env variable, and it's easy to add it to a docker image: https://t.co/zgU1qlzgJC

1

0

117

mobicham @mobicham

22 days ago

https://t.co/orcApi6B7J You can now easily load various pre-quantized models (block FP8, NVFP4, AWQ/GPTQ/HQQ, certain GGUFs) via a vLLM plugin! You can also run on-the-fly quant as well, easy to use: 1 or 2 flags to enable!

1

19

5

13

2K

mobicham @mobicham

23 days ago

There's a very simply one-shot trick that seems to improve the quality of low-bit weight quantization by quite a bit in some cases: simply reordering the rows. It doesn't require changing the matmul kernel, only reshuffling the activations.

mobicham's tweet photo. There's a very simply one-shot trick that seems to improve the quality of low-bit weight quantization by quite a bit in some cases: simply reordering the rows.
It doesn't require changing the matmul kernel, only reshuffling the activations. https://t.co/e4qpTof51u

0

8

0

4

318

mobicham @mobicham

24 days ago

Skimming through the PolarQuant paper by Google and found HQQ is still alive 👀

1

22

6

4

2K

mobicham @mobicham

about 1 month ago

@cHHillee 😂

0

179

mobicham @mobicham

about 1 month ago

@cHHillee 😂

0

138

mobicham @mobicham

about 1 month ago

@GPU_MODE @tbpn @AnushElangovan 🔥

0

1

0

45

mobicham @mobicham

about 1 month ago

@PatrickToulme Goat 🫡

0

161

mobicham @mobicham

about 1 month ago

Babe, wake up, new GemLite update. Up to 1.7x+ faster FP8 block quantization on the RTX PRO 6000 end-2-end in vLLM!

1

32

3

7

2K

mobicham @mobicham

about 1 month ago

What kind of FP4 format does the new TPU8 use? MXFP4 quality is pretty poor, and NVFP4 is specific to Nvidia, so I'm guessing it uses a smaller group size (<32) to achieve better quality ? 🤔

0

5

0

338

mobicham @mobicham

about 1 month ago

@elliotarledge Did you run it end-to-end in the full-stack vLLM/SGLang? You'll only see real perf issues when you do that and not just the kernel in isolation

0

50

mobicham @mobicham

about 1 month ago

@BoyuanChen0 Sick results 🫡

0

43

mobicham @mobicham

about 1 month ago

@vikhyatk Not supporting tcgen instructions doesn't matter at all. The RTX Pro 6000 is an excellent card for its price.

0

2

0

500

mobicham @mobicham

about 2 months ago

@fchollet The best of both worlds: https://t.co/RTf0M7vdxK

0

4

0

1

305

mobicham @mobicham

about 2 months ago

Some great on-device multi-vector work at Dropbox, check it out!

Josh Clemm

@joshclemm

about 2 months ago

Open sourcing something fun from @Dropbox: Witchcraft. It's a local search engine built in Rust with no API keys or vector DB required. Think: ColBERT / late interaction style retrieval, but packaged to run locally (perfect for coding agents). Let's dive in👇

19

467

37

617

115K

0

2

0

254