Mahekdeep Singh

@MahekdeepS

Tech Banker I guess… and Engineer💡

San Francisco, CA

Joined August 2012

1.7K Following

186 Followers

864 Posts

Mahekdeep Singh

@MahekdeepS

about 1 month ago

@demian_ai @nebiustf @jatin_batra1

313

Mahekdeep Singh

@MahekdeepS

about 2 months ago

@jatin_batra1 @signulll I’m in!

Mahekdeep Singh

@MahekdeepS

2 months ago

@jatin_batra1 @covered_call Yikes

Mahekdeep Singh

@MahekdeepS

2 months ago

@jatin_batra1 @naval @tferriss Will need to look closely

Who to follow

Arshan Ahmad

@arshanmahmad

Founder of @purefiapp https://t.co/067kA1Myls, Investing with Friday Ventures

Aditya Bahl

@aditya_bahl

Founder & CEO of https://t.co/iFDBkGPjQ8 | 2 x 26.2

2 months ago

@jatin_batra1 @naval @tferriss In Naval I trust!

MahekdeepS retweeted

Apoorv Agrawal

@apoorv03

2 months ago

One of the most substantive classes with @ChaseLochmiller at Stanford. We went deep on economics of the datacenter: - Where is the ~$650B of AI infra capex actually going this year? - Who's capturing the margin, who's getting squeezed? - How the bottleneck has moved from GPUs to power, and where it goes next - The economics of neoclouds

136

240K

Mahekdeep Singh

@MahekdeepS

2 months ago

@jatin_batra1 @daviddorg 😂

Mahekdeep Singh

@MahekdeepS

2 months ago

@lauradang0 Please add me! Thank you!

Mahekdeep Singh

@MahekdeepS

2 months ago

@jatin_batra1 @Rane3560 @adocomplete Respect.

MahekdeepS retweeted

jack

@jack

3 months ago

https://t.co/jgZkBvYOPt

567

11K

24K

Mahekdeep Singh

@MahekdeepS

4 months ago

@jatin_batra1 @mercury Ty!

MahekdeepS retweeted

Matt Shumer

@mattshumer_

4 months ago

https://t.co/ivXRKXJvQg

119K

28K

182K

87M

MahekdeepS retweeted

Naval

@naval

5 months ago

Envy is the acknowledgment of having lost a secret race.

613

20K

618K

MahekdeepS retweeted

Bourbon Capital

@BourbonCap

5 months ago

$MU is going to build the largest semiconductor manufacturing facility in U.S. history and the world's most advanced memory fab You know the rest... they gonna need more semiconductor equipment: bigger orders for $ASML $AMAT and $KLAC

BourbonCap's tweet photo. $MU is going to build the largest semiconductor manufacturing facility in U.S. history and the world's most advanced memory fab

You know the rest... they gonna need more semiconductor equipment: bigger orders for $ASML $AMAT and $KLAC https://t.co/j1dP4RK1Z9

260

862

172K

Mahekdeep Singh

@MahekdeepS

6 months ago

@pitdesi @grandabbang Accidentally went there few years ago, some of the best Indian food I’ve ever had.

323

MahekdeepS retweeted

Jaya Gupta

@JayaGup10

6 months ago

Part 2:

MahekdeepS retweeted

Rohan Paul

@rohanpaul_ai

6 months ago

Very insightful post by Gavin below on Nvidia's $20B Groq licensing deal. AI inference has 2 steps, Prefill & Decode. Prefill means the model reads your whole prompt and context. Decode means it writes the reply one small chunk of text at a time. These 2 steps like different hardware. Prefill benefits from large memory capacity, so it can hold long context, speed matters less. Decode benefits from extremely fast memory bandwidth and very low delay, memory size matters less. --- Now Groq’s LPU is a complete departure from the GPU/TPUs. It doesn’t use HBM (External Memory) at all. Instead, it uses SRAM (Static Random Access Memory), which is built directly into the silicon of the chip. Groq’s big advantage is how quick it runs single user inference. Thanks to its compute layout and only-local memory, it hits one of the fastest single user tokens per second rates on the market. SRAM is up to 100x faster than the HBM found in GPUs. Because the data is right there on the chip, there is zero “fetch time.” The downside of the chip is that it has no external DDR or HBM memory - only onboard SRAM. While fast, the 230 MB capacity per chip has been super low, and implies that a reasonably small open source model like Llama 70B requires 10 racks of processors and over 100 kW of power to run. Groq’s LPUs don’t need liquid cooling, and that’s a pretty big deal. Most data centers around the world still use air cooling, not liquid. Nvidia’s Blackwell and upcoming chips, on the other hand, will mainly rely on liquid cooling since they’re built for top performance. --- Nvidia’s 3 Rubin chips line up with that. Rubin is the main GPU with HBM4, very high bandwidth, used for training and high-throughput decode. It is usually a dual-die part with NVLink and large HBM capacity. Rubin CPX is a sibling tuned for prefill. It is single-die, uses 128GB GDDR7, about 30 PFLOPS NVFP4, cheaper and cooler, and accelerates long-context attention. It trades bandwidth for capacity and cost to make prefill efficient. Rubin with HBM is the balanced workhorse for training and high throughput inference. Nvidia’s plan is to run prefill on CPX, then hand off decode to standard Rubin, all orchestrated in the same rack. That is the connection. They are designed to work together. A Groq-style Rubin with SRAM will be very fast for token-by-token decode, but holds less. Systems can use CPX or normal Rubin to do prefill, then hand off to the SRAM Rubin for the fast typing part. Result, you get faster answers for interactive apps while keeping costs sensible by using the right chip for each step. After the deal, Groq will remain an independent company, hold the IP, and service GroqCloud (it’s online neocloud business) with all the middle-eastern deals it has done over the years. Now, about competition. Nvidia understands that if the HBM, energy, liquid cooling, and CoWoS limits choke the market and lead to a serious compute shortage, both customers and rivals will hunt for workarounds. In that situation, Groq, which doesn’t depend on the same supply chain constraints, becomes an obvious alternative.

rohanpaul_ai's tweet photo. Very insightful post by Gavin below on Nvidia's $20B Groq licensing deal.

AI inference has 2 steps, Prefill & Decode.

Prefill means the model reads your whole prompt and context. Decode means it writes the reply one small chunk of text at a time.

These 2 steps like different hardware. Prefill benefits from large memory capacity, so it can hold long context, speed matters less. Decode benefits from extremely fast memory bandwidth and very low delay, memory size matters less.

---

Now Groq’s LPU is a complete departure from the GPU/TPUs. It doesn’t use HBM (External Memory) at all. Instead, it uses SRAM (Static Random Access Memory), which is built directly into the silicon of the chip.

Groq’s big advantage is how quick it runs single user inference. Thanks to its compute layout and only-local memory, it hits one of the fastest single user tokens per second rates on the market.

SRAM is up to 100x faster than the HBM found in GPUs. Because the data is right there on the chip, there is zero “fetch time.”

The downside of the chip is that it has no external DDR or HBM memory - only onboard SRAM. While fast, the 230 MB capacity per chip has been super low, and implies that a reasonably small open source model like Llama 70B requires 10 racks of processors and over 100 kW of power to run.

Groq’s LPUs don’t need liquid cooling, and that’s a pretty big deal. Most data centers around the world still use air cooling, not liquid. Nvidia’s Blackwell and upcoming chips, on the other hand, will mainly rely on liquid cooling since they’re built for top performance.

---

Nvidia’s 3 Rubin chips line up with that. Rubin is the main GPU with HBM4, very high bandwidth, used for training and high-throughput decode. It is usually a dual-die part with NVLink and large HBM capacity.

Rubin CPX is a sibling tuned for prefill. It is single-die, uses 128GB GDDR7, about 30 PFLOPS NVFP4, cheaper and cooler, and accelerates long-context attention. It trades bandwidth for capacity and cost to make prefill efficient.

Rubin with HBM is the balanced workhorse for training and high throughput inference.

Nvidia’s plan is to run prefill on CPX, then hand off decode to standard Rubin, all orchestrated in the same rack. That is the connection. They are designed to work together.

A Groq-style Rubin with SRAM will be very fast for token-by-token decode, but holds less. Systems can use CPX or normal Rubin to do prefill, then hand off to the SRAM Rubin for the fast typing part.

Result, you get faster answers for interactive apps while keeping costs sensible by using the right chip for each step.

After the deal, Groq will remain an independent company, hold the IP, and service GroqCloud (it’s online neocloud business) with all the middle-eastern deals it has done over the years.

Now, about competition. Nvidia understands that if the HBM, energy, liquid cooling, and CoWoS limits choke the market and lead to a serious compute shortage, both customers and rivals will hunt for workarounds. In that situation, Groq, which doesn’t depend on the same supply chain constraints, becomes an obvious alternative.

645

102

551

134K

MahekdeepS retweeted

Jaya Gupta

@JayaGup10

6 months ago

https://t.co/uPXcTUEsnc

415

937

19K

MahekdeepS retweeted

Max Weinbach

@mweinbach

6 months ago

JP Morgan has a great design flow chart to explain how ASIC vs. merchant semi design works as well as customer owned tooling

mweinbach's tweet photo. JP Morgan has a great design flow chart to explain how ASIC vs. merchant semi design works as well as customer owned tooling https://t.co/MzrfSjPCOF

101

188K

MahekdeepS retweeted

Dev Shah

@0xDevShah

6 months ago

Google will never sell TPUs. The moment Google sells TPUs at scale, they transform their architectural advantage into a commodity. Google's internal teams have first-order claims on TPU capacity because those workloads directly generate revenue and strategic moats. Any TPU sold externally is a TPU not used to defend Google's primary profit engines. Right now, TPUs are Google's proprietary edge, vertical integration that lets them operate AI infrastructure at costs competitors can't match. DeepMind can burn through compute budgets that would bankrupt OpenAI because Google doesn't pay retail GPU prices, they pay internal TPU marginal cost. If Google starts selling TPUs externally: - They have to price competitively vs Nvidia GPUs, which means revealing their cost structure. Suddenly, everyone knows Google's true AI compute costs aren't magic. - Selling bare metal TPUs means publishing detailed specs, performance benchmarks, and programming interfaces. This is handing competitors a blueprint for "how Google actually does AI at scale." Right now, that's proprietary. The moment it's a product, it becomes studied, reverse-engineered, and eventually replicated. - Google Cloud already sells TPU access via GCP at premium prices. If they start selling bare TPUs, they're competing with their own higher-margin cloud offering. No sophisticated buyer would pay GCP markup when they could buy TPUs directly and run them cheaper. GCP TPU pricing is not aggressive compared to GPU alternatives, but it is premium. This isn't incompetence, it's intentionally priced to discourage massive external adoption. Google makes TPUs available enough to avoid antitrust "hoarding infrastructure" accusations and to capture some high-margin cloud revenue, but they don't actually want external customers consuming capacity at scale. Compare this to AWS, which sells every chip they can manufacture (Graviton, Trainium, Inferentia) because AWS is a commodity infrastructure business. Google's core business is ads and consumer products that depend on AI infrastructure. Selling the infrastructure is like McDonald's selling their supply chain to Burger King, even if it generates revenue, you're strengthening competitors and weakening your primary business. You can't simultaneously be a commodity chip vendor AND maintain proprietary infrastructure advantage. The moment you sell, you commoditize. The moment you commoditize, your advantage evaporates. Given that selling TPUs appears strategically unsound, why is there speculation that Google pursue it anyway? I think because cloud divisions at every hyperscaler have perpetual "we need differentiation" anxiety, and custom chips look like differentiation. But differentiation only matters if it protects margins or captures share without destroying your core business. Google selling TPUs would be differentiation that destroys more value than it creates.

133

703

290K

Mahekdeep Singh

@MahekdeepS

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users