Eric Quinnell @divBy_zero - Twitter Profile

@corsix @opinali This. Also all the different EDA tools only support different random subsections of Verilog. Part of the fun is finding the sub-sub-set that actually works in them all without losing chip intent

1

2

0

52

Who to follow

Yun-Ta Tsai

@yunta_tsai

Sr. Staff Engineer @Tesla_AI

Philip Turner

@philipturnerar

Admin, Nanofactory Project. High voltage electrical engineer. Currently working on Phase 2 / 7 of a scientific moonshot.

Zhenjun Zhao

@zhenjun_zhao

Postdoc @unizar | PhD @CUHKofficial | 3D vision, SLAM, Image matching (https://t.co/WZzEImwX03)

Eric Quinnell

@divBy_zero

20 days ago

@Leik0w0 It’s usually better for hardware systolic arrays. Hw doesn’t execute matmuls in the same physical directions as paper and pencil matrices - the row and col weights come from the same physical place, so it has to be row+row or col+col. Someone is doing the xpose somewhere

0

1

0

118

Eric Quinnell

@divBy_zero

24 days ago

@insane_analyst He who controls the SPICE controls the universe

0

27

1

3

13K

Eric Quinnell

@divBy_zero

28 days ago

To be fair, at low enough FP quantization it does indeed return to associative behavior. Bc it’s INT

SemiAnalysis

@SemiAnalysis_

28 days ago

Floating point math is not associative! And many of the highest performance kernels split the workload among SMs and accumulate partial results in a nondeterministic order. Many AI labs just accept this, or pay a huge performance penalty for determinism. DeepSeek decided to do neither. (1/4) 🧵

SemiAnalysis_'s tweet photo. Floating point math is not associative! And many of the highest performance kernels split the workload among SMs and accumulate partial results in a nondeterministic order. Many AI labs just accept this, or pay a huge performance penalty for determinism. DeepSeek decided to do neither. (1/4) 🧵

12

604

34

483

109K

0

5

1

2K

Eric Quinnell

@divBy_zero

about 2 months ago

@itsclivetime Still true

0

5

0

185

divBy_zero retweeted

Clive Chan

@itsclivetime

about 2 months ago

i believe it was @divBy_zero that wisely said that, contrary to common sense, it is always easier to fix performance problems in silicon than change an entrenched sw stack

3

19

2

3

4K

Eric Quinnell

@divBy_zero

about 2 months ago

@yacineMTB Confirmed anecdotal data point. (Mid 40s, but close enough. This vibe coding stuff is legit)

0

11

0

1K

Eric Quinnell

@divBy_zero

about 2 months ago

Callout to the many leads and engineers who worked this over the years, esp @rawat_ritvik @aaronsrogers and Pete. This will be a massive (and needed) upgrade to all cars and bots.

0

8

0

486

Eric Quinnell

@divBy_zero

about 2 months ago

Congrats AI5 team, I know it was a rocky road

Elon Musk

@elonmusk

about 2 months ago

Congrats to the @Tesla_AI chip design team on taping out AI5! AI6, Dojo3 & other exciting chips in work.

7K

119K

13K

6K

19M

1

18

0

1

2K

divBy_zero retweeted

NASA

@NASA

about 2 months ago

Hello, Moon. It’s great to be back. Here’s a taste of what the Artemis II astronauts photographed during their flight around the Moon. Check out more photos from the mission: https://t.co/rzM1P0QbOl

NASA's tweet photo. Hello, Moon. It’s great to be back.

Here’s a taste of what the Artemis II astronauts photographed during their flight around the Moon. Check out more photos from the mission: https://t.co/rzM1P0QbOl https://t.co/6jWINHkDLh

10K

806K

173K

62K

30M

Eric Quinnell

@divBy_zero

2 months ago

@LottoLabs @ptremblay Pedantically you are correct, yes. Way less loss than current quantization, and that detail would derail non technical folks ever further, so I didn’t split the hairs

0

1

0

44

Eric Quinnell

@divBy_zero

2 months ago

@CliffLattner Yes, exactly. If doing inference, the compute will hide under the dram loads, even at high batch. For training, it is extra compute to pay.

2

3

0

151

Eric Quinnell

@divBy_zero

2 months ago

Two days of weird takes, so I must: “8x perf” is 32-bit baseline vs 4-bit compressed “KV cache” is merely a use case and hard to capitalize full perf. And yes, many already compress here. But it’s lossless. The others aren’t. We should have been using this all along.

Google Research

@GoogleResearch

2 months ago

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc

1K

39K

6K

22K

19M

5

52

3

13

7K

Eric Quinnell

@divBy_zero

2 months ago

My flight lands, I exit the plane, look for the Baggage Claim sign, and then there it was: A 448Gpbs PAM4 Keysight waveform analyzer advertisement, on an LED backlight mega screen. Thank you SJC

0

59

1

3K