edge distiller @edgedistiller - Twitter Profile

28 days ago

@LottoLabs Grok actually feels different from other models. Purely for the sake of "cognitive diversity" I want it to succeed. Even if that means finding a different niche at a different price point from gigantic frontier models tuned for coding.

0

1

0

43

edge distiller @edgedistiller

29 days ago

New video up on my youtube channel about the new Qwen MTP models! I also compare quality benchmarks using BenchLoop, thanks to @outsource_

edgedistiller's tweet photo. New video up on my youtube channel about the new Qwen MTP models! I also compare quality benchmarks using BenchLoop, thanks to @outsource_ https://t.co/BaDy1VKKo1

1

8

0

4

2K

edge distiller @edgedistiller

29 days ago

@leftcurvedev_ I also found no difference in speed between 2 and 6 draft tokens on different hardware with the same flags as you. Almost exactly 1.8x using MTP vs. without. I cover it in my recent video. https://t.co/lhqK8CalUy

edgedistiller's tweet photo. @leftcurvedev_ I also found no difference in speed between 2 and 6 draft tokens on different hardware with the same flags as you. Almost exactly 1.8x using MTP vs. without. I cover it in my recent video.
https://t.co/lhqK8CalUy https://t.co/8sjJkyPfYY

0

225

edge distiller @edgedistiller

about 1 month ago

@k_flowstate Why do we need to trust anyone? Either a statement is true or it is not. People that produce a lot of true, useful statements are generally worth giving attention to.

0

1

0

31

Who to follow

Ed 🌹

@ECrypto0

Follow for News, Threads and Alpha on @OasisProtocol - Co-run the unofficial $ROSE Telegram community - Oasis Rose Garden

Necklace

@necklace_btc

Blockchain freelance developer. Bitcoin DeFi GameFi NFT Solidity C++ Python

66岁的老王

@zuoye520

我朋友问我最怕什么？我说：我最怕自己变得太有钱！他又问我为什么这么说？我说：人们总说怕什么来什么！

edge distiller @edgedistiller

about 1 month ago

@ItsmeAjayKV Ultimately is it worth it if you have to use a smaller (worse) quantization due to the additional VRAM overhead? MTP seems like an optimization that only makes sense when your VRAM too large for the current tier you are using but too small for the next tier up.

1

0

566

edge distiller @edgedistiller

about 1 month ago

@dadhalfdev @LottoLabs Happy to help! Let me know if you have any suggestions or questions.

0

1

0

9

edge distiller @edgedistiller

about 1 month ago

@outsource_ Haha I just recorded a video on local benchmarking before this was posted, I definitely want to check this out! https://t.co/Zrx1jNerxu

1

0

69

edge distiller @edgedistiller

about 1 month ago

It's only May and local LLM benchmarks already got me like this

0

1

0

56

edge distiller @edgedistiller

about 1 month ago

@regularaugust What was his name again?

0

1

0

885

edge distiller @edgedistiller

about 1 month ago

Claude Code is the Windows 11 of agent harnesses.

himanshu

@himanshustwts

about 1 month ago

the harness of claude code is very interesting. a random unstable header at the start of the prompt was breaking KV-cache reuse on a 52k-token context. NVIDIA stripped it out and TTFT dropped by 5x.

himanshustwts's tweet photo. the harness of claude code is very interesting.

a random unstable header at the start of the prompt was breaking KV-cache reuse on a 52k-token context.

NVIDIA stripped it out and TTFT dropped by 5x. https://t.co/OBaoVYobcK

15

375

28

229

57K

0

2

0

83

edge distiller @edgedistiller

about 1 month ago

@sudoingX I just made a video covering this (llama.cpp build for local inference), would be happy to hear any thoughts: https://t.co/CzKTF2r1eS

edge distiller @edgedistiller

about 1 month ago

I made a video on running LLMs locally, specifically by using other people's benchmarks on LocalMaxxing. All criticism/feedback is welcome! https://t.co/zUBimNKv47

0

2

0

1

431

0

1

0

1

201

edge distiller @edgedistiller

about 1 month ago

I made a video on running LLMs locally, specifically by using other people's benchmarks on LocalMaxxing. All criticism/feedback is welcome! https://t.co/zUBimNKv47

0

2

0

1

431

edge distiller @edgedistiller

about 1 month ago

@ChemPhysMajor @arena @GoogleDeepMind Both Qwen 3.6 models are much more expensive than Gemma 4.

1

0

75

edge distiller @edgedistiller

about 1 month ago

@MindMechanical @greenTetra_ Literally thought of this as I read it, still can't decide which one to pick.

0

18

edge distiller @edgedistiller

about 1 month ago

@AlphaMFPEFM @Elaina43114880 That's fair, but it can also just be extrapolated from the price, since xAI has a massive amount of compute and we know they can make a 10T model or do whatever they want. Ultimately the constraint is not parameter size, but performance for a given price class.

1

0

28

edge distiller @edgedistiller

about 1 month ago

@AlphaMFPEFM @Elaina43114880 If it's closed source then all that matters is the API price and the quality of output. The model could be 500T parameters for all I care.

1

0

20

edge distiller @edgedistiller

about 1 month ago

@LottoLabs You know you've made it when someone is willing to rent an H200 to make big number even bigger on the leaderboard.

1

2

0

295

edge distiller @edgedistiller

about 2 months ago

@banteg "hedging" and it's literally just confidence intervals based on measurement error.

0

610

edge distiller @edgedistiller

about 2 months ago

@Rokieee__ @bojie_li "Reasoning compresses. Factual knowledge doesn't."

0

8

0

1

523

edge distiller @edgedistiller

about 2 months ago

@witchof0x20 @halvarflake This is the only correct answer in the replies and it got 0 engagement, what a shame. The correct answer was remote attestation, people.

0

32

edge distiller

@edgedistiller

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users