Mike Gunter

3 months ago

Very excited to share more about what we've been working on.

3 months ago

We’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. We call it the MatX One. The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar. We’ve raised a $500M Series B to wrap up development and quickly scale manufacturing, with tapeout in under a year. The round was led by Jane Street, one of the most tech-savvy Wall Street firms, and Situational Awareness LP, whose founder @leopoldasch wrote the definitive memo on AGI. Participants include @sparkcapital, @danielgross and @natfriedman’s fund, @patrickc and @collision, @TriatomicCap, @HarpoonVentures, @karpathy, @dwarkesh_sp, and others. We’re also welcoming investors across the supply chain, including Marvell and Alchip. @MikeGunter_ and I started MatX because we felt that the best chip for LLMs should be designed from first principles with a deep understanding of what LLMs need and how they will evolve. We are willing to give up on small-model performance, low-volume workloads, and even ease of programming to deliver on such a chip. We’re now a 100-person team with people who think about everything from learning rate schedules, to Swing Modulo Scheduling, to guard/round/sticky bits, to blind-mated connections—all in the same building. If you’d like to help us architect, design, and deploy many generations of chips in large volume, consider joining us.

122

2K

201

1K

3M

3

23

0

2

4K

3 months ago

Very excited to share more about want we've been working on.

3 months ago

We’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. We call it the MatX One. The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar. We’ve raised a $500M Series B to wrap up development and quickly scale manufacturing, with tapeout in under a year. The round was led by Jane Street, one of the most tech-savvy Wall Street firms, and Situational Awareness LP, whose founder @leopoldasch wrote the definitive memo on AGI. Participants include @sparkcapital, @danielgross and @natfriedman’s fund, @patrickc and @collision, @TriatomicCap, @HarpoonVentures, @karpathy, @dwarkesh_sp, and others. We’re also welcoming investors across the supply chain, including Marvell and Alchip. @MikeGunter_ and I started MatX because we felt that the best chip for LLMs should be designed from first principles with a deep understanding of what LLMs need and how they will evolve. We are willing to give up on small-model performance, low-volume workloads, and even ease of programming to deliver on such a chip. We’re now a 100-person team with people who think about everything from learning rate schedules, to Swing Modulo Scheduling, to guard/round/sticky bits, to blind-mated connections—all in the same building. If you’d like to help us architect, design, and deploy many generations of chips in large volume, consider joining us.

122

2K

201

1K

3M

7

45

1

2

4K

8 months ago

The FT's calculation is off by a factor of eight. Rental pricing is per chip, not per eight-chip server.

2

18

0

259

@xAI | Prev. MetaAi; MSFTResearch, allen_ai, GoogleDeepMind; @berkeley_ai

about 1 year ago

@eladgil L'Industrie and also get gelato. Boichik is a lot better than one of the NYC places reputed to be best. So, maybe prioritize pizza over bagels?

0

1

0

90

Who to follow

Sheng Shen

@shengs1123

Esther is a confused human being

@esther_confused

we all dumb dumb, give me gum gum founding engineer @junglelearning_ & ex Meta https://t.co/P8URfjhSik

Kefan XIAO

@KevinKiao

weightlifting 🏋️ & AI - GDM, previous Anthropic, previous pretraining/data research of Gemini at Google Deepmind. Only represents my personal opinions.

MikeGunter_ retweeted

about 1 year ago

MatX is designing chips and systems to 10x the computing power for the world’s largest AI workloads. Today, we are pleased to announce the closing of a >$100M Series A funding round led by @sparkcapital, with participation from @JaneStreetGroup, @danielgross and @natfriedman, @TriatomicCap, @HarpoonVentures, and @adamdangelo. In two years, we proved out all our technical bets across ML numerics, chip design and implementation, software, and system design—and secured all the necessary partnerships—to develop our chip. With this round of investment, we are now sufficiently funded to bring our systems to market.

17

184

20

48

36K

almost 2 years ago

@wjtweet @TheStalwart @tracyalloway @reinerpope I'm optimistic about ML helping on both the front and back ends of ASIC design. We're not betting the company on it, though. The architecture (eliminated now) and execution risks were enough.

0

3

0

46

almost 2 years ago

I really enjoyed talking about the process and business of semiconductor design with @tracyalloway and @TheStalwart on the Odd Lots podcast. Joe and Tracy were wonderful hosts: They put me at ease and guided the conversation with the lightest of touch. We talked about what doing semiconductor design is like, why LLMs are hungry for as many FLOPS/$ as they can get, how @MatXComputing can provide that, and how NVIDIA's moat might be bridged. I particularly liked that @reinerpope got to communicate some of the sense of beauty that I also feel about good design. Helping lead MatX's engineering team (and meeting with our customers) is a humbling honor: It's talking with people who are the world experts in what we're chatting about. Being on Odd Lots was talking with grandmasters at conversation.

Joe Weisenthal

@TheStalwart

almost 2 years ago

NEW ODD LOTS: Two Veteran Chip Designers Have A Plan To Take On Nvidia @tracyalloway and I talked to @reinerpope and @MikeGunter_, both formerly of Alphabet, about their new company MatX that's aiming to build the ultimate semiconductor just for LLMs https://t.co/xCIMrfByU9

6

85

14

30

87K

3

33

6

3

27K

MikeGunter_ retweeted

MatX @MatXComputing

about 2 years ago

Introducing MatX: we design hardware tailored for LLMs, to deliver an order of magnitude more computing power so AI labs can make their models an order of magnitude smarter. Our hardware would make it possible to train GPT-4 and run ChatGPT, but on the budget of a small startup. Our founding team has designed chips at Google and Amazon, and we’ve built chips with 1/10 the team size typically needed. Here’s how we’re approaching the problem of inefficient and insufficient compute. While other chips treat all models equally, we dedicate every transistor to maximizing performance on the world’s largest models. Our goal is to make the world’s best AI models run as efficiently as allowed by physics, bringing the world years ahead in AI quality and availability. A world with more widely available intelligence is a happier and more prosperous world—picture people of all socioeconomic levels having access to an AI staff of specialist MDs, tutors, coaches, advisors, and assistants. Our design focuses on cost efficiency for high-volume pre-training and production inference for large models. This means: 1/ We’ll support training and inference. Inference first. 2/ We optimize for performance-per-dollar first (we’ll be best by far), and for latency second (we’ll be competitive). 3/ We offer excellent scale-out performance, supporting clusters with hundreds of thousands of chips. 4/ Peak performance is achieved for these workloads: large Transformer-based models (both dense and MoE), ideally 20B+ parameters, and inference having thousands of simultaneous users. 5/ We give you low-level access to the hardware. We believe that the best hardware is designed jointly by ML hardware experts and LLM experts. Everyone on the MatX team, from new grad to industry veteran, is exceptional. Our industry veterans have built ML chips, ML compilers, and LLMs, at Google or Amazon or various startups. Our CEO, @reinerpope, was Efficiency Lead for Google PaLM, where he designed and implemented the world’s fastest LLM inference software. Our CTO, @mikegunter_, was Chief Architect for one of Google’s ML chips (at the time, Google’s fastest) and was an Architect for Google’s TPUs. Our CDO Silicon, @avinashgmani, has over 25 years of experience in building products and world-class engineering teams in silicon and software at Amazon, Innovium and Broadcom. We’re backed by $25M of investment from specialist investors and operators who share our vision, including: @danielgross and @natfriedman (lead investors, and experts in the AI space), @rkhemani (CEO at Auradine), @amasad (CEO at Replit), @outsetcap, @homebrew, @svangel. Additionally we have investment from leading AI and LLM researchers including @IrwanBello, @jekbradbury, @achowdhery, @liamfedus, and @hardmaru.

MatXComputing's tweet photo. Introducing MatX: we design hardware tailored for LLMs, to deliver an order of magnitude more computing power so AI labs can make their models an order of magnitude smarter.

Our hardware would make it possible to train GPT-4 and run ChatGPT, but on the budget of a small startup.

Our founding team has designed chips at Google and Amazon, and we’ve built chips with 1/10 the team size typically needed. Here’s how we’re approaching the problem of inefficient and insufficient compute.

While other chips treat all models equally, we dedicate every transistor to maximizing performance on the world’s largest models. Our goal is to make the world’s best AI models run as efficiently as allowed by physics, bringing the world years ahead in AI quality and availability. A world with more widely available intelligence is a happier and more prosperous world—picture people of all socioeconomic levels having access to an AI staff of specialist MDs, tutors, coaches, advisors, and assistants.

Our design focuses on cost efficiency for high-volume pre-training and production inference for large models.

This means:

1/ We’ll support training and inference. Inference first.

2/ We optimize for performance-per-dollar first (we’ll be best by far), and for latency second (we’ll be competitive).

3/ We offer excellent scale-out performance, supporting clusters with hundreds of thousands of chips.

4/ Peak performance is achieved for these workloads: large Transformer-based models (both dense and MoE), ideally 20B+ parameters, and inference having thousands of simultaneous users.

5/ We give you low-level access to the hardware.

We believe that the best hardware is designed jointly by ML hardware experts and LLM experts. Everyone on the MatX team, from new grad to industry veteran, is exceptional. Our industry veterans have built ML chips, ML compilers, and LLMs, at Google or Amazon or various startups.

Our CEO, @reinerpope, was Efficiency Lead for Google PaLM, where he designed and implemented the world’s fastest LLM inference software.

Our CTO, @mikegunter_, was Chief Architect for one of Google’s ML chips (at the time, Google’s fastest) and was an Architect for Google’s TPUs.

Our CDO Silicon, @avinashgmani, has over 25 years of experience in building products and world-class engineering teams in silicon and software at Amazon, Innovium and Broadcom.

We’re backed by $25M of investment from specialist investors and operators who share our vision, including: @danielgross and @natfriedman (lead investors, and experts in the AI space), @rkhemani (CEO at Auradine), @amasad (CEO at Replit), @outsetcap, @homebrew, @svangel. Additionally we have investment from leading AI and LLM researchers including @IrwanBello, @jekbradbury, @achowdhery, @liamfedus, and @hardmaru.

25

460

76

219

143K

almost 3 years ago

@EricHallahan @reinerpope We should be OK if the trends from the last decade hold!

0

3

0

107

MikeGunter_ retweeted

almost 3 years ago

I’m excited to announce our new company, MatX, started with @MikeGunter_. We want to make AI better, faster, and cheaper by building more powerful hardware. Read on for a short introduction, or see our full announcement here: https://t.co/1HOx8uYyCA.

21

370

35

115

253K