kapicode @kapicode - Twitter Profile

Pinned Tweet

kapicode

@kapicode

24 days ago

My #ralph implementation

1

2

0

119

kapicode

@kapicode

about 3 hours ago

Benchmarking rule I’m trying to follow: If a result is inside run-to-run noise, call it noise. Not “breakthrough.” Not “secret flag.” Not “RDMA is worse.” Noise. Local LLM work needs more boring honesty and fewer victory laps.

0

21

kapicode

@kapicode

about 5 hours ago

👀

kapicode's tweet photo. 👀 https://t.co/aYinh6jLe7

0

17

kapicode

@kapicode

about 13 hours ago

@binh Are you sure you don’t have a config problem? What quants are you using? What is causing the stoppage in your harnesses?

1

0

669

kapicode

@kapicode

1 day ago

If you run local LLMs: what is your actual setup? Not the dream build. The daily driver. GPU/APU? Memory/VRAM? Model size? Serving stack? What breaks most often? I’m trying to compare practical local AI systems, not leaderboard screenshots.

121

71

2

56

15K

kapicode

@kapicode

about 13 hours ago

@manibatra That’s an awesome setup!

0

1

0

162

kapicode

@kapicode

about 24 hours ago

DeepSeek-V4-Flash on one Strix Halo box was more interesting than I expected. Not because it was the fastest thing in the world. Because it made 256K context feel plausible on a local machine.

12

40

0

8

5K

kapicode

@kapicode

about 13 hours ago

@whatup9911 What is the wrapper?

0

399

kapicode

@kapicode

about 13 hours ago

I want to push local hardware to be as capable as possible. And I want to push my harness to be productive with the smallest/dumbest models possible. No leaning on the model for my harness. Caveat: there is a minimum model quality that I simply can’t avoid. Qwen3.5-9b breaks at q4 but not at q8 (for now)

2

20

0

2

1K

kapicode

@kapicode

about 13 hours ago

I actually want to try baking RL into my harness lol. Small model gives two options, big model chooses better option, over time small model makes better choices? I already have multi-model requests built-in so you can compare the decisions various models make (one main “real” model, and many “ghost” models which don’t actually have the ability to run tool calls). It’s proven somewhat effective at sussing out the strengths/weaknesses of models. But uhh… can be expensive.

Taelin

@VictorTaelin

about 21 hours ago

RL is a mistake, thinking is a mistake, and if we just put all the money into crafting an astronomically good, massive dataset, we'd pretrain a model that outperforms everything that exists by a considerable margin source: my ass (I have no idea what I'm talking about)

61

561

8

105

47K

1

2

0

133

kapicode

@kapicode

about 14 hours ago

@LottoLabs If someone has the bandwidth, I’d be curious to see if it could be a potential runtime optimization. Also: - Tiered K/V (hottest in vRAM, medium hot in RAM, cold on SSD) - The CPU inference optimizations noted by Sakura Yuki I think there is yet meat on the bone for local AI.

1

51

kapicode

@kapicode

about 15 hours ago

Is it possible to eGPU 3090 for prefill, then generate on the Strix Halo?

Akash Alpha

@akashalpha_

about 15 hours ago

@Code_Fault @kapicode Hey! Are these two connected? eGPU? Would do again?

1

4

0

2K

2

5

0

2K

kapicode

@kapicode

about 14 hours ago

@LottoLabs Yeah I’ve seen them—can you do the prefill split about which I asked? Could I pull it off with an eGPU and my existing BeeLink?

1

0

69

kapicode

@kapicode

about 15 hours ago

Pushing the idea further... manual REAP based on which tasks get used more and/or require more intelligence.

kapicode

@kapicode

about 15 hours ago

Thinking out loud. Could domain-specific small dense models (SLMs) be trained and then combined together as a MoE? Almost certainly you could put a manual router model in front, but it would be kind of cool if the whole thing could be consolidated into one model. Imagine cherry-picking the experts you need for your use case.

0

64

0

50

kapicode

@kapicode

about 15 hours ago

Thinking out loud. Could domain-specific small dense models (SLMs) be trained and then combined together as a MoE? Almost certainly you could put a manual router model in front, but it would be kind of cool if the whole thing could be consolidated into one model. Imagine cherry-picking the experts you need for your use case.

0

64

kapicode

@kapicode

about 15 hours ago

Strix Halo is vindicated by price/performance. 4x256GB Mac Studios barely beat it. You can essentially get the same speed on 1xSH with q5 with chadrock.

Lifetimize

@lifetimization

about 17 hours ago

4 node Mac Studio 256 GB cluster achieves 1.92x the speed with Qwen3.6 27B 8bit model via exo RDMA over thunderbolt 5. @exolabs @0xSero

lifetimization's tweet photo. 4 node Mac Studio 256 GB cluster achieves 1.92x the speed with Qwen3.6 27B 8bit model via exo RDMA over thunderbolt 5.
@exolabs @0xSero https://t.co/LcSgUpZl2I

1

12

0

3

12K

0

2

0

1

139

kapicode

@kapicode

about 15 hours ago

@0xSero Honestly this is just showing me that for 27b Strix Halo is still under-appreciated.

0

30

kapicode

@kapicode

about 16 hours ago

Coming soon... @FlyCockpitApp

0

2

0

14

kapicode

@kapicode

about 17 hours ago

@0xRaghuboi Idk if this applies with nvidia hardware (I have a 3090 but have experimented more with Strix Halo) and I find that compressing KV less is actually faster, and if you have the headroom to not compress, it is often better to not. Results may vary fwiw

0

82

kapicode

@kapicode

about 17 hours ago

@sakurayukiai I remember you posting about CPU cache optimization for running models on CPU. What is the best CPU-only setup you’ve seen so far?

0

35

kapicode retweeted

Whimsicali

@Whimsicali

about 17 hours ago

I am currently running two split, separate models. Qwopus3.6-27B-v2-Q4_K_M with vulkan. I get better results with vulkan but I need to revisit rocm again. Both 7900xtx's hold their own in developing my projects. MTP is a big help. Not quite the speed of a 3090 but about $400-500 cheaper than is currently available. They are solid GPUs for local models.

0

2

1

0

87

kapicode

@kapicode

about 17 hours ago

@bakerybr02 @jack Llama? What kind of results do you get? What is the experience like?

0

1

0

337

kapicode

@kapicode

about 17 hours ago

@pierrelezan You are a baller

0

550

kapicode

@kapicode

Last Seen Users on Sotwe

Trends for you

Most Popular Users