Steffen Röcker @sroecker - Twitter Profile

Pinned Tweet

about 2 months ago

Your Hermes Agent can now delegate to RLMs 🙌 Recreated the document analyzer example with the converted skill. 136 PDF pages analyzed. Best part: Auto-configures from HERMES_MODEL / HERMES_PROVIDER env vars @NousResearch @Teknium https://t.co/ReL1NqwlCQ

sroecker's tweet photo. Your Hermes Agent can now delegate to RLMs 🙌 Recreated the document analyzer example with the converted skill. 136 PDF pages analyzed. Best part: Auto-configures from HERMES_MODEL / HERMES_PROVIDER env vars @NousResearch @Teknium
https://t.co/ReL1NqwlCQ https://t.co/6Psxjo9INL

Gabriel Lespérance

@GabLesperance

about 2 months ago

https://t.co/NVq0bYi6s9

10

443

44

901

84K

8

329

28

565

48K

sroecker retweeted

Tomasz Tunguz

@ttunguz

2 days ago

Open-weight models have overtaken closed models on OpenRouter. 69.1% of token volume now goes to open-weight models. 30.9% to closed. Competition is a discovery procedure — and developers are discovering the value of open models. 🧵

ttunguz's tweet photo. Open-weight models have overtaken closed models on OpenRouter.

69.1% of token volume now goes to open-weight models. 30.9% to closed.

Competition is a discovery procedure — and developers are discovering the value of open models.

🧵 https://t.co/9ZOLA6llvZ

14

156

37

75

30K

Steffen Röcker

@sroecker

4 days ago

@IterIntellectus Men used to go to war.

0

1

0

51

sroecker retweeted

Red Hat AI

@RedHat_AI

6 days ago

Speculators v0.5.0 just dropped with 3 big updates: - DFlash training support. Draft all tokens in one pass via block diffusion - Unified online/offline training powered by @vllm_project's hidden states extraction system - Docs & tutorials overhaul for faster onboarding https://t.co/tCMZIrgQf2

0

48

7

12

4K

Who to follow

Christopher Altman (九龍守)

@coherence

Starlab veteran・NASA-trained Commercial Astronaut・Chief Scientist in AI & Quantum Technology・日本語・Japan Fulbright・Physics・Frontier AI・https://t.co/zIB4JcmJ1z

George Singer

@georgewsinger

Building a portable VR headset designed to replace PCs & Laptops • Runs Linux • Market-leading pixel density (≫ existing headsets) • Premium compute specs

Jorge Tavares

@jorgetavares

Scientist and Engineer of the Artificial. Nowadays @Microsoft

sroecker retweeted

Matt Hicks

@matthicksj

8 days ago

Project Lightwell is a $5 billion investment that marks a fundamental shift in how we think about our role as open source stewards. I believe it will define the next chapter of Red Hat's engineering mission. We are applying the same discipline, upstream-always commitment, and engineering rigor across all active application layers that modern enterprise environments depend on.

2

16

7

1

2K

sroecker retweeted

spacy

@dosco

8 days ago

the trick is not to do native tool calling instead do code gen in a RLM style REPL

1

43

2

28

2K

sroecker retweeted

Red Hat AI

@RedHat_AI

9 days ago

EAGLE 3.1 is out. The team identified attention drift as the root cause of acceptance-length degradation at deeper speculation steps. Fix: FC normalization + post-norm hidden-state feedback. Result: 2x longer acceptance length in long-context workloads, 2.03x per-user throughput on Kimi K2.6. Already in @vllm_project nightly. Native support lands in the next release v0.22.0. Open source draft model available now.

2

58

4

12

5K

sroecker retweeted

Arnav Chavan @ArnavChavan6

16 days ago

🚀 Organizing the Efficient Qwen Competition @icmlconf ! Goal: Minimize LLM inference latency for a single GPU without breaking model quality. Prizes: $3K / $2K / $1K + present at ICML 2026, Seoul Getting Started - https://t.co/On1yK4fnu9 Leaderboard - https://t.co/7HUbO2oA3A

6

143

16

148

11K

sroecker retweeted

Julien Chaumond

@julien_c

16 days ago

What hardware actually powers open-source AI? Not benchmarks. Not vendor marketing. Real-world community usage. We’re launching @huggingface Hardware: → trending GPUs & CPUs → VRAM distribution → inference hardware trends → what the OSS AI ecosystem really runs on

julien_c's tweet photo. What hardware actually powers open-source AI?

Not benchmarks.
Not vendor marketing.
Real-world community usage.

We’re launching @huggingface Hardware:
→ trending GPUs & CPUs
→ VRAM distribution
→ inference hardware trends
→ what the OSS AI ecosystem really runs on https://t.co/aVZDRQdAf1

41

410

72

168

81K

sroecker retweeted

Dan Alistarh @DAlistarh

17 days ago

Weight-only quantization powers local LLMs like llama.cpp or Ollama. But SOTA quantized accuracy requires complex kernels that are notoriously hard to implement. Can we get SOTA accuracy and keep things simple? Our new GSQ (Gumbel-Softmax Quantization) method says yes. 🧵

DAlistarh's tweet photo. Weight-only quantization powers local LLMs like llama.cpp or Ollama. But SOTA quantized accuracy requires complex kernels that are notoriously hard to implement.
Can we get SOTA accuracy and keep things simple? Our new GSQ (Gumbel-Softmax Quantization) method says yes. 🧵 https://t.co/GxQDW8BD7R

1

51

12

22

6K

sroecker retweeted

Daniel Han

@danielhanchen

23 days ago

We released experimental MTP Qwen3.6 Unsloth GGUFs! Qwen3.6 27B MTP now runs at 140 tokens/s. Qwen3.6 35B-A3B MTP gets 220 tokens/s generation on a single GPU. Qwen3.6 27B and 35B-A3B have >1.4x speed-up over the original GGUFs without any change in accuracy. Guide + GGUFs + Benchmarks: https://t.co/x9BYC3iXCL In terms of average speedup, we see a 1.4x for dense models at draft tokens = 2 and for the MoE around 1.15 to 1.2x. We do not recommend more than 2 draft tokens because the acceptance rate drops precipitously from 83% to 50% with 4 draft tokens, and the forward passes for MTP become less beneficial. Use `--spec-type mtp --spec-draft-n-max 2` Thanks to Aman for https://t.co/0WKkIC0kyW!

danielhanchen's tweet photo. We released experimental MTP Qwen3.6 Unsloth GGUFs!

Qwen3.6 27B MTP now runs at 140 tokens/s. Qwen3.6 35B-A3B MTP gets 220 tokens/s generation on a single GPU.

Qwen3.6 27B and 35B-A3B have >1.4x speed-up over the original GGUFs without any change in accuracy.

Guide + GGUFs + Benchmarks: https://t.co/x9BYC3iXCL

In terms of average speedup, we see a 1.4x for dense models at draft tokens = 2 and for the MoE around 1.15 to 1.2x.

We do not recommend more than 2 draft tokens because the acceptance rate drops precipitously from 83% to 50% with 4 draft tokens, and the forward passes for MTP become less beneficial.

Use `--spec-type mtp --spec-draft-n-max 2`

Thanks to Aman for https://t.co/0WKkIC0kyW!

60

783

118

680

123K

sroecker retweeted

Tom Turney

@no_stp_on_snek

24 days ago

appreciate the comprehensive write-up from @_EldarKurtic, @mgoin_, @RedHat_AI on TurboQuant. data on H100 with native FP8 Tensor Cores looks right for what was tested. few things to add from the non-H100 side, where most of my testing lives:

1

16

1

4

2K

sroecker retweeted

Eldar Kurtić

@_EldarKurtic

25 days ago

For more details and results check the full blog at https://t.co/ncAaF5V7vF . This is joint work with @mgoin_ and Alexandre Marques from @RedHat_AI and @vllm_project .

1

18

2

9

1K

sroecker retweeted

Eldar Kurtić

@_EldarKurtic

25 days ago

TurboQuant has drawn a lot of attention recently, but the accompanying evals didn't tell the full story. So we ran what I believe is the first comprehensive study of TurboQuant: where it helps, where it falls short, and how it impacts accuracy, latency, and throughput. Findings:

_EldarKurtic's tweet photo. TurboQuant has drawn a lot of attention recently, but the accompanying evals didn't tell the full story.

So we ran what I believe is the first comprehensive study of TurboQuant: where it helps, where it falls short, and how it impacts accuracy, latency, and throughput.

Findings:

11

321

52

265

80K

sroecker retweeted

Armin Ronacher ⇌

@mitsuhiko

28 days ago

I think @antirez ds4.c is important! I wrote down my thoughts on why I built pi-ds4 and why we need to focus our local model efforts stronger than we do currently. https://t.co/61h4JDHTZL

16

374

50

209

31K

sroecker retweeted

tender

@tenderizzation

28 days ago

wow

tenderizzation's tweet photo. wow https://t.co/s0PorYacxE

25

4K

98

152

123K

sroecker retweeted

antirez @antirez

28 days ago

Welcome to DS4, a specialized inference engine for DeepSeek v4 Flash. https://t.co/UrUJz5I2R1 This project would have been impossible without the existence of llama.cpp and GGML and the work of @ggerganov and all the other contributors. Thanks!

47

1K

217

776

196K

sroecker retweeted

Yannick Nick

@keennay

29 days ago

>new AMD Instinct MI350P GPU >CDNA 4 >PCIe Gen 5 x16 >144GB HBM3E 4TB/s >native MXFP6 and MXFP4 support

17

368

14

71

39K

Steffen Röcker

@sroecker

29 days ago

@spiritbuun Nice! Hope decode doesn't suffer too much.

0

171

sroecker retweeted

0xSero

@0xSero

29 days ago

New careers will be born, this one is mine.

16

183

15

11

14K

Steffen Röcker

@sroecker

29 days ago

First entry to @LottoLabs localmaxxing: 28k prefill with vLLM serving Qwen 3.6 35B A3B REAP (0.5 ratio) in NVFP4 on a 5070 Ti with 16 GiB VRAM https://t.co/1zuyNZ4oZ7 Check out the model and instructions here https://t.co/wDPaCXGVF8

0

1

0

1

126

Steffen Röcker

@sroecker

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users