Zoe Carver @_zoecarver - Twitter Profile

about 1 month ago

Tenstorrent just dropped serious benchmarks on DeepSeek 671B, and they are worth watching. On decode, they hit 350 tokens per second. That is more than double Fireworks and Google Vertex at 144 tsu, and over 12x faster than Novita at 28 tsu. On prefill (100k sequence length), they clock 4.0 seconds — right in the top tier, behind only Google Vertex at 1.4 seconds while beating everyone else between 6.0 and 8.5 seconds. The cost story is even bigger. At high throughput, Tenstorrent delivers at $6 per million tokens while NVIDIA GPU setups jump to $30 and keep climbing. They compare Galaxy Blackhole with the GB300 NVL72 rack from NVIDIA, quoting SemiAnalysis benchmarks. The advantage comes from Tenstorrent’s purpose-built inference architecture. It is engineered from the ground up to keep cost per token low even as throughput scales, unlike general-purpose GPUs that were originally designed for training workloads. For DeepSeek 671B specifically, this translates into dramatically better efficiency on the metrics that matter most to AI companies: speed + real dollar cost at high volume. This is the structural edge they are betting the entire inference market on, and the real question is whether Tenstorrent could serve inference at scale, provided that they gain traction within the infrastructure industry.

mzuhair123's tweet photo. Tenstorrent just dropped serious benchmarks on DeepSeek 671B, and they are worth watching.

On decode, they hit 350 tokens per second. That is more than double Fireworks and Google Vertex at 144 tsu, and over 12x faster than Novita at 28 tsu. On prefill (100k sequence length), they clock 4.0 seconds — right in the top tier, behind only Google Vertex at 1.4 seconds while beating everyone else between 6.0 and 8.5 seconds.

The cost story is even bigger. At high throughput, Tenstorrent delivers at $6 per million tokens while NVIDIA GPU setups jump to $30 and keep climbing. They compare Galaxy Blackhole with the GB300 NVL72 rack from NVIDIA, quoting SemiAnalysis benchmarks.

The advantage comes from Tenstorrent’s purpose-built inference architecture. It is engineered from the ground up to keep cost per token low even as throughput scales, unlike general-purpose GPUs that were originally designed for training workloads. For DeepSeek 671B specifically, this translates into dramatically better efficiency on the metrics that matter most to AI companies: speed + real dollar cost at high volume.

This is the structural edge they are betting the entire inference market on, and the real question is whether Tenstorrent could serve inference at scale, provided that they gain traction within the infrastructure industry.

4

175

18

105

26K

_zoecarver retweeted

Jim Keller

@jimkxa

about 1 month ago

We landed on @ArtificialAnlys It’s pretty fast

14

405

36

58

166K

_zoecarver retweeted

Jim Keller

@jimkxa

about 1 month ago

SuperCluster 36 up and running. 4 Galaxy all to all in a torus. 9 Quads all to all connected. Looks like one computer to software. More silicon, faster computer

jimkxa's tweet photo. SuperCluster 36 up and running. 4 Galaxy all to all in a torus. 9 Quads all to all connected. Looks like one computer to software.
More silicon, faster computer https://t.co/lqwutU9JXO

19

331

34

52

221K

_zoecarver retweeted

Jim Keller

@jimkxa

about 2 months ago

Cuda is another last war problem https://t.co/Ri0mKmXMFP

0

60

2

13

6K

Who to follow

johnnysswlab.com

@johnnysswlab

We help development teams speed up their C/C++ software. Performance-related blog: https://t.co/FqGMpbEH2w Direct help: https://t.co/3Dn3HMlgqM

Dr. Holger Flick

@hflickster

Trainer, Author & Software Developer — FlixEngineering LLC — Software Engineering, Training & Consulting

Amara Emerson

@amaraemerson

I work on compilers for a fruit company. Or fruit for a compiler company, can’t remember which.

_zoecarver retweeted

Ai2 @allen_ai

2 months ago

MolmoBot, our open robotic manipulation suite trained entirely in simulation, now has code, training data, a data generation pipeline, & evals all available. This puts our robotics models within reach of any research lab—no extensive real-world data collection required. 🧵

allen_ai's tweet photo. MolmoBot, our open robotic manipulation suite trained entirely in simulation, now has code, training data, a data generation pipeline, & evals all available.

This puts our robotics models within reach of any research lab—no extensive real-world data collection required. 🧵 https://t.co/LfgQYhkIjp

9

235

36

144

54K

_zoecarver retweeted

Swift Language @SwiftLang

about 2 years ago

Programming microcontrollers with Swift has never been easier. @kubamracek introduces a new repository of example projects to help get you started. https://t.co/yf4fNpoEtl

4

236

52

65

26K

_zoecarver retweeted

Saleem Abdulrasool @compnerd

over 2 years ago

We’re dedicated to sharing our work @browsercompany - so today we’re publishing our first post on building rich native experiences on Windows with Swift & open sourcing our swift-firebase repo First up, interoperability! Windows APIs, COM, C++ and how they integrate with Swift🧵

compnerd's tweet photo. We’re dedicated to sharing our work @browsercompany - so today we’re publishing our first post on building rich native experiences on Windows with Swift & open sourcing our swift-firebase repo

First up, interoperability! Windows APIs, COM, C++ and how they integrate with Swift🧵 https://t.co/ID5KeUZ6eW

15

310

41

36

34K

_zoecarver retweeted

Kuba (Brecka) Mracek @kubamracek

almost 3 years ago

Embedded Swift -- a vision for a new compilation mode of Swift with first class support for embedded/low-level environments https://t.co/GePjVlh0Su

2

167

45

21

22K

_zoecarver retweeted

Swift Language @SwiftLang

over 3 years ago

Where’s the Swift project going in 2023? @pathofshrines looks ahead with a summary from across the community. https://t.co/0HF2Yg2zsK

1

183

65

16

0

Zoe Carver @_zoecarver

over 3 years ago

https://t.co/3QfioMFnOx

0

2

0

_zoecarver retweeted

JP Simard @simjp

over 3 years ago

We’ve updated SwiftSyntax to use the new parser written in Swift! Performance is up to 15% faster, release binaries are half their previous size, and binaries are more portable on Linux. Kudos to the SwiftSyntax folks who fixed several bugs we reported very quickly.

1

64

10

1

0

Zoe Carver @_zoecarver

almost 4 years ago

@krisjusiak @ziglang @rustlang @D_Programming @seanbax I assume this is using libstdc++ because it's running on linux? Would be interested to see how libc++ compares.

0

Zoe Carver @_zoecarver

about 4 years ago

https://t.co/7lCCmzjsYP

0

1

0

_zoecarver retweeted

SZA

@sza

over 4 years ago

Very important to surround urself w ppl that see u . Fully

365

140K

51K

2K

0

_zoecarver retweeted

Kavon Farvardin @call1cc

over 4 years ago

We have internships available in the programming languages, compilers, debuggers, and development infrastructure teams at Apple! You'll learn firsthand about these thrilling topics from some really awesome folks; no prior experience required! See here: https://t.co/6XqWGuMXw0

0

22

7

4

0