i just beat @GoogleDeepMind's turboquant
introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss
- 10x @ 8K context, 11.2x @ 32K
- NIAH recall 1.000 across 4K-32K
- LongBench Δ ≈ 0 vs FP16
turboquant tops out at 4-6x at the same quality. we doubled it.
read more: https://t.co/PAV5WdAzN6
@kirrithan
@krishgarg and I built Shard, beating @GoogleDeepMind's TurboQuant on KV cache compression.
10x compression on Llama-3.1-8B-Instruct at 8K.
NIAH recall: 1.000.
Keys: RoPE-aware PCA + int4 fused attention.
Values: Hadamard + VQ.
Same needles. Less cache.
https://t.co/y43CRsRfAj
i just beat @GoogleDeepMind's turboquant
introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss
- 10x @ 8K context, 11.2x @ 32K
- NIAH recall 1.000 across 4K-32K
- LongBench Δ ≈ 0 vs FP16
turboquant tops out at 4-6x at the same quality. we doubled it.
read more: https://t.co/PAV5WdAzN6
@kirrithan
We spent 4 years building a payments network for the new internet.
Entrepreneurs now earn $3.3B annually on Whop. Millions of people are clipping, labeling data, deploying agents, and starting businesses to get paid.
Today we're opening Whop Payments Network to everyone.
Halftime: Dynamically weaves AI-generated ads into the scenes you’re watching, so breaks feel like part of the story instead of interruptions.
@krishgarg@yuviecodes@lohanipravin
i won the @xai hackathon by making ads for X Videos
introducing Halftime. targeted ad generation using AI that feels like a part of your movies and shows
built with @yuviecodes@lohanipravin
AI is coming for your jobs.
Now it’s coming for your hobbies too.
We built Steve, the Cursor for Minecraft.
Steve and his AI agents can hunt, build and mine on command and even collaborate.
built with @lohanipravin
drove 2 hours today to get to @Shopify builder sundays
nobody there.
turns out it’s next week
i guess nobody loves building as much as i do 😓
(ps @alspee can i get a pizza i’m building something cool)