Prasanna Kanagasabai (PK) @prasannain - Twitter Profile

about 2 months ago

@vllm_project built a KVConnectorBase_V1 connector that restores evicted blocks faster than GPU cache hits. pip install tierkv https://t.co/5vMYPxIJtm #vLLM #LocalLLM #LLMInference #opensource

0

1

0

48

Prasanna Kanagasabai (PK) @prasannain

about 2 months ago

Your LLM just recomputed 30,000 tokens it already did 5 minutes ago. Every cache eviction = full prefill from scratch. 10–30 seconds. Again. tierKV fixes this. Here's how 🧵

2

3

0

1

79

Prasanna Kanagasabai (PK) @prasannain

about 2 months ago

6/ Getting started: pip install tierkv One command. Hooks into vLLM or EXO automatically. No source changes. 📖 https://t.co/6aI0jq7v0k ⭐ https://t.co/VteKzPlgzX

1

0

37

Prasanna Kanagasabai (PK) @prasannain

about 2 months ago

@wuhanbat @alexabelonix Works well in larger prompts and repetitive use like iterations over a report. Shared more on metrics here : https://t.co/Ke2nd54ppP

0

29

Who to follow

Building Agents for AI+Security. Using Ralph loops, Claude Code, Codex, PI. CEO @appseccouk @kloudle. https://t.co/UPtKxmxvlt https://t.co/MX8hMTqSvq

c0c0n2026

@_c0c0n_

kəˈkuːn A 8-day hacking & cyber security conference • Since 2008 • Built by the community, for the community. 06 - 13 Oct 2026, The Grand Hyatt, Kochi, Kerala

Prasanna Kanagasabai (PK) @prasannain

3 months ago

Results: 95.2% detection (2k payloads, 110 cats), 14× FP drop, 8.6 ms p95. Paper: https://t.co/fWJ1l5C3nr Endorsement: https://t.co/zqDJ2xJAH5 Mech-interp thoughts welcome! #MechInterp #LLMSecurity cc @NeelNanda5 @edoardo_debe @goodside 2/2

0

65

Prasanna Kanagasabai (PK) @prasannain

3 months ago

What if we stopped treating Sparse Autoencoders as fragile single-feature detectors for prompt injection? Instead, I mined **conjunctive co-activation patterns** — groups of features that only fire together in real attacks. On Gemma Scope (layers 6/12/18). 1/2

1

0

1

0

63

Prasanna Kanagasabai (PK) @prasannain

5 months ago

🔗 Full technical breakdown: https://t.co/nnPqhsnPZT 🤗 Ready-to-use model (HF): PKSGIN/qwen3-30b-selective-quant-MixedMPW-mlx

0

28

Prasanna Kanagasabai (PK) @prasannain

5 months ago

🧵 MLX said "pick one precision for all experts." We needed 9 at FP16, 119 at 4-bit. So we split what wasn't meant to be split. Here's how we got Qwen3-MoE-32B running in 64GB on Apple Silicon 👇 1/🧵

1

0

55

Prasanna Kanagasabai (PK) @prasannain

5 months ago

Why this works on MLX: ✅ gather_mm and gather_qmm are independent kernels ✅ Each block only sees local indices [0-N] ✅ mx.where on Metal GPUs is basically free

1

0

22

Prasanna Kanagasabai (PK) @prasannain

5 months ago

Total cost: ~$2 GPU rental, one weekend, 4 Python scripts. Model: PKSGIN/qwen3-30b-selective-quant on HuggingFace Full technical writeup (profiling, quantization, MLX conversion, benchmarks): https://t.co/PFL0JE80py Security teams deserve local models that actually work.

0

1

0

45

Prasanna Kanagasabai (PK) @prasannain

5 months ago

Running a 30B LLM for security analysis means choosing between: Cloud APIs (ship sensitive data off-prem) Local quantized models (too degraded to be useful) I spent a weekend building a third option. 47 tok/s on a MacBook. Zero data leakage. 🧵

1

2

0

81

Prasanna Kanagasabai (PK) @prasannain

5 months ago

Unified memory changes the game. Traditional GPU: model must fit in VRAM (RTX 4090 = 24GB) Apple Silicon: CPU/GPU/Neural Engine share 32GB pool 18 GB model loads once. GPU runs inference in-place. Headroom for OS + KV cache. Air-gapped ready.

1

0

32

Prasanna Kanagasabai (PK)

@prasannain

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users