Heinrich Wendel @hmw147 - Twitter Profile

5 days ago

@pip_net That’s all correct. BUT they have always been terrible at product. Besides basic chat AI and some TPU rental, I have a very hard time believing they will really win anything here.

0

5

0

928

Heinrich Wendel @hmw147

about 1 month ago

@outsource_ Qwen 3.6 35B A3B next please 🙏

0

2

0

240

Heinrich Wendel @hmw147

about 1 month ago

@ModelScope2022 My only concern with Qwen 3.6 was the verbosity, which slows down every task. This might be the solution.

0

86

Heinrich Wendel @hmw147

about 1 month ago

This is big for self hosting of OSS models

vLLM

@vllm_project

about 1 month ago

KV cache shouldn't disappear every time vLLM restarts. With @novita_labs, we're sharing PegaFlow — a production-grade external KV cache service that plugs into vLLM through the external KV connector interface. PegaFlow runs as a standalone Rust daemon owning the host KV pool, SSD cache, and RDMA resources. vLLM workers attach via CUDA IPC + gRPC, and cache survives engine crashes, upgrades, and model switches. In production-oriented evaluations: 🚀 2.15× faster vLLM startup with a pre-warmed 500 GiB host pool 📈 56% higher throughput for 8 Qwen3-8B instances sharing one cache ⚡ 72% higher throughput for DeepSeek-V3.2 MLA TP8 (logical KV stored once, not per rank) 🌐 194 GB/s average remote-read throughput across nodes Three-level hierarchy: pinned DRAM, remote DRAM over RDMA, local SSD on io_uring. Integrates through the existing `kv_transfer_config` path — no vLLM source changes. 📖 https://t.co/rf2VmevP7J

6

287

34

154

30K

0

48

Who to follow

DethBuilds

@WhoRU2JudgeMe

Green Eyed Raven who likes the Sims 4, SDV and Minecraft :) she/her EA Gallery ID: DethBuilds

🦻WHY REPENT IF UR NOT COSTUME TO THE NEEDY🪬

@Zion5934

🔋I only value my health not y'all made up culture 💅😂 ⚡🏧🪜⚜️

Heinrich Wendel @hmw147

about 1 month ago

@pupposandro I thought OSS took a bit of a backseat when Lin Junyang left?

1

0

1

100

Heinrich Wendel @hmw147

about 1 month ago

@AIInvestorHQ Does that mean that NVDA GPUs will double in price?

0

1

0

5K

Heinrich Wendel @hmw147

about 1 month ago

@saltyAom It’s just a beautiful piece of engineering

0

108

Heinrich Wendel @hmw147

about 2 months ago

@__tinygrad__ Large enough for a Qwen 3.6 27B model, but not large enough for the KV cache.

0

1

0

1K

Heinrich Wendel @hmw147

about 2 months ago

@__tinygrad__ Red Pro?

Wccftech

@wccftech

about 2 months ago

AMD launches MI350P, its first PCIe "Instinct" in four years – packs CDNA 4 GPU with 4.6 PFLOPs AI compute, 144 GB HBM3E at 600W. https://t.co/uLAh7eokph

1

104

10

17

19K

0

10

Heinrich Wendel @hmw147

about 2 months ago

@GobindSinghDeo 💯

0

29

Heinrich Wendel @hmw147

about 2 months ago

@akmalnasir @EkonomiMalaysia There is also a huge complexity in the software layer to operate data centers, especially when it comes to AI. That would require local operators though to own the data centers, not foreign hyperscalers.

0

82

Heinrich Wendel @hmw147

about 2 months ago

@Soya_Cincau @MITIMalaysia With fuel prices surging, this seems to set exactly the wrong incentives.

0

45

7

0

2K

Heinrich Wendel @hmw147

about 2 months ago

@TeksEdge Maybe it actually works 🤔

0

1

0

33

Heinrich Wendel @hmw147

about 2 months ago

@ivanfioravanti How many generations could I run in parallel at 32k input / 150 output tokens while maintaining 10s response times?

1

0

48

Heinrich Wendel @hmw147

about 2 months ago

What workstation / home server hardware do I need to serve Qwen 3.6 27b / Gemma 4 31b with 32k context window for 20 concurrent generations of ~150 output tokens at 10s latency? CC @DrFriesOfficial @jun_song @__tinygrad__ @sudoingX

2

1

0

138

Heinrich Wendel @hmw147

about 2 months ago

@ApurvaSanghi @HectorPollitt @MuthukumaraMani Very interesting. What does it mean to align agriculture & irrigation with climate goals? How does it actually help?

4

1

0

181

Heinrich Wendel @hmw147

about 2 months ago

@jarredsumner Mind sharing your harness?

0

172

Heinrich Wendel @hmw147

about 2 months ago

@brandonjcarl Token costs have stayed constant during this time, even though token production is 50x cheaper on new systems like NVL72. These savings will eventually be passed on to consumers, once sufficient compute supply is available.

0

1

0

195

Heinrich Wendel @hmw147

about 2 months ago

@jun_song What would Apple have to do to fill the prefill gap?

0

369

Heinrich Wendel @hmw147

about 2 months ago

@pmddomingos We‘re just very efficient. When we work, we actually work.

0

76

Heinrich Wendel

@hmw147

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users