Djordje Madic @dmadic - Twitter Profile

so SGLang supports platform plugins now I wanted to get a feel for the per-token overhead in the streaming phase For mocked model streaming 500 tok/s the I measured 0.3 ms overhead per token https://t.co/ZZBRWGgjmF What's not yet perfect is having to build sglang from sources to use platform plugins

0

1

0

64

Djordje Madic

@dmadic

about 1 month ago

I was surprised that Dynamo does not sit on top vLLM/SGLang HTTP servers but uses them as Python API engines

0

67

Djordje Madic

@dmadic

about 1 month ago

Made a PoC today to understand how Mooncake works I was able to run Qwen 0.6B on CPU with Mooncake Nano vLLM inspired 2xPrefill 1xDecode nodes TCP transport mechanism https://t.co/x1A44VWzaP Entrypoint pd_two_prefill_chat.py

0

80

Djordje Madic

@dmadic

about 1 month ago

👀

Cirrascale Cloud Services

@Cirrascale

about 1 month ago

What if you could cut inference costs in half without giving up production-scale performance? That's what @Tenstorrent Galaxy Blackhole, now live on the Cirrascale AI Innovation Cloud, was built to deliver: ✅ Approximately half the cost of leading GPU alternatives ✅ Bare-metal access (no virtualization overhead) ✅ 90% of HuggingFace models run as-is, no rewrites required ✅ Latency-optimized for large-context LLM inference and video generation 👉 Request access: https://t.co/C5PscwsnqX

Cirrascale's tweet photo. What if you could cut inference costs in half without giving up production-scale performance?
That's what @Tenstorrent Galaxy Blackhole, now live on the Cirrascale AI Innovation Cloud, was built to deliver:
✅ Approximately half the cost of leading GPU alternatives
✅ Bare-metal access (no virtualization overhead)
✅ 90% of HuggingFace models run as-is, no rewrites required
✅ Latency-optimized for large-context LLM inference and video generation

👉 Request access: https://t.co/C5PscwsnqX

0

5

2

1K

0

48

Djordje Madic

@dmadic

about 2 months ago

@kevinmi920 @tenstorrent @prodia How can we try this out

0

1

0

160

dmadic retweeted

Kevin Mi

@kevinmi920

about 2 months ago

Introducing Infinite Studio ♾. Last week, @tenstorrent x @prodia announced the fastest Wan 2.2 video generation in the world. We built a demo to show what that speed unlocks: directing an infinite movie in real time. Demo 👇

22

122

45

15

5K

Djordje Madic

@dmadic

about 2 months ago

WHAT A DAY @tenstorrent

0

4

0

167

Djordje Madic

@dmadic

2 months ago

@wesbos Better home page 🙏

0

1

0

17

Djordje Madic

@dmadic

2 months ago

it's super fun working on this can't wait to ship some more amazing things to TT customers

Sally Ward-Foxton @sallywf

2 months ago

.@tenstorrent is launching Galaxy Blackhole servers and clusters for fast LLM inference (DeepSeek-671B at up to 350 t/s/u), among other applications - no disaggregation needed, according to @jimkxa: https://t.co/G2yu85tXs2

4

137

14

41

22K

0

75