Topi Santakivi @sandst1 - Twitter Profile

4 days ago

@ivanfioravanti Yup, rtk is great! Using it in pretty much all of my LLM workflows 👌 Configuring it for unknown/custom commands is a bit of work but it handles the common patterns nicely and the coverage is growing too.

0

1

0

73

Topi Santakivi @sandst1

4 days ago

@ChicouTiMix @ivanfioravanti Mem bandwith 600 Gb/s, DGX Spark has 273 Gb/s, so on a single decode stream that's 2x the speed.

0

20

Topi Santakivi @sandst1

8 days ago

@_Aria_2210 32. n x n x 2

0

22

Topi Santakivi @sandst1

8 days ago

@hicasamadim 72. n x n x 2

0

6

Who to follow

Karri Huhtanen (@[email protected])

@khuhtanen

#Internet #security #entrepreneur (@RadiatorAAA), tweets in #English and #Finnish about #wireless, #Internet, #security, #business, #IPR etc. Opinions my own.

Lorn Potter

@llornkcor

Author, Code Monkey, Recording Artist/Musician, https://t.co/18W27uYiDY Hands-On Mobile and Embedded Development with Qt 5 @ljerryp

Sivan

@sivangr

Love Python and AI and all between. aspiring Ruster 🦀Hands on Open Source Software adoption consultant. Has a heart for quality with an entrepreneurial slant.

Topi Santakivi @sandst1

12 days ago

@antirez Ha, missed the details of this thread earlier. Nice approach!

0

1

0

10

Topi Santakivi @sandst1

12 days ago

@waltonoemi @Youssofal_ 4-bit

0

7

Topi Santakivi @sandst1

12 days ago

@agitbackprop @antirez My experiences with it are mixed. Sometimes works, but also breaks down, often when the model has to make multiple edits to the same file, or larger edits. The usual error is using old hashes. Larger/cloud LLMs are better at it. Also interested to hear @antirez thoughts.

0

3

0

271

Topi Santakivi @sandst1

16 days ago

@PoetzlPtzl @aijoey only cloud versions published so far.

0

12

Topi Santakivi @sandst1

17 days ago

@mr_r0b0t @NVIDIAAI @Alibaba_Qwen Working nicely! Tried this model yesterday and got pretty much the same numbers 🥳 The unsloth NVFP4 version has 0% acceptance (i guess MTP heads in wrong format or sth), sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP is working nicely.

1

0

221

Topi Santakivi @sandst1

17 days ago

@karpathy Wow, that’s awesome! Congrats!

0

1

0

10

Topi Santakivi @sandst1

18 days ago

@MichaelZima @KuittinenPetri @antirez On Mac I use OMLX, a bit MTPLX too. It runs nicely, lot of progress in the past months, also the multi-token prediction stuff has been boosting inference speeds on Mac too. If you already have a Studio, plenty of good stuff to run.

1

0

86

Topi Santakivi @sandst1

18 days ago

@KuittinenPetri @MichaelZima Qwen 3.6 27B, different quants, atm via llama-cpp as it just landed MTP support. Also the DwarfStar by @antirez is running nicely on the GB10. Those are my main models atm, then also a bit of Qwen 3.6-35B-A3B and Qwen3-Coder-Next.

1

0

1

123

Topi Santakivi @sandst1

18 days ago

@KuittinenPetri @mr_r0b0t wait: did Spark / sm121 NVFP4 support already land on vllm main branch? or did you build some custom setup?

1

2

0

157

Topi Santakivi @sandst1

18 days ago

@MichaelZima @KuittinenPetri That's exactly what i'm doing atm

1

0

41

Topi Santakivi @sandst1

20 days ago

@techedgedaily @rohanpaul_ai Yeah. Now anybody can vibe-code their software into existence, and if it's good software, they can have other people pay them to use it. Oh wait..

0

2

0

59

Topi Santakivi @sandst1

20 days ago

@MemoryReboot_ The only thing missing is the M3 Ultra in stock.

1

0

491

Topi Santakivi @sandst1

20 days ago

@indes_yo @nash_su MTP is speculative decoding, specifically for token generation.

0

27

Topi Santakivi @sandst1

21 days ago

@0xSero With MoE, the experts are not activated just based on prompts, the expert selection is done for _each token_ separately. This is why the full MoE usually needs to be loaded in memory even if a fraction of it is used per token.

1

8

0

1

1K

Topi Santakivi @sandst1

21 days ago

@usr_bin_roygbiv @LottoLabs Parallel processing. The mem bandwith is low for a single token stream but there's a lot of compute in that box to handle X tasks at a time. Parallel subagents etc. If you only need single-user single-stream, go M5 Max. https://t.co/6MjBCDXZ2T

0

59

Topi Santakivi

@sandst1

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users