Ajay Singh

b/acc, context platform engineer

12 days ago

New @OracleCloud Infrastructure benchmarks show NeuralMesh with Augmented Memory Grid delivered: ⚡ 10x more concurrent users ⚡ 10x higher token throughput ⚡ 7x more tokens served per GPU Take a closer look: https://t.co/0efFUSRCv1

Ajay_sf retweeted

@AccBalanced

about 2 months ago

proud to be partnering with the cracked #inferencex team @SemiAnalysis_ & @huggingface on real-world AI benchmarks stay tuned - much more to come from @weka

AccBalanced's tweet photo. proud to be partnering with the cracked #inferencex team @SemiAnalysis_ & @huggingface on real-world AI benchmarks

stay tuned - much more to come from @weka https://t.co/2AgdNxqdXX

138

Ajay_sf retweeted

b/acc, context platform engineer

@AccBalanced

about 2 months ago

kudos to @callanjfox and his repo - the catalyst for context memory platform engineering: https://t.co/rNe7Hq1N4W

Astrophotographer based in Northern California. /he / him

3 months ago

It's exciting to see what YTL AI Cloud is building for Malaysia’s sovereign AI future. With @WEKA's NeuralMesh, they’re scaling secure, high-performance infrastructure to power next-gen AI, including the country’s first locally developed LLM. https://t.co/oSflqalBHX

Who to follow

Michael Kieran

@michaelkieran

🧑‍💻Techie with a big mouth 🌱Startup junkie 🎥 TFD presenter ⚽️ Bristol City FC 🎮Retro gamer 🚀 $HBAR investor

3 months ago

Excited about this upcoming conversation with John Fragalla from @nvidia and Shimon Ben-David from @weka. Get a spot here 👇 https://t.co/yORDX3pIiF

Ajay_sf retweeted

Vikram Sekar

@vikramskr

5 months ago

If you’re tired of Claude summarizing your convo, all that is about to change when Context Memory Storage comes online. Agentic AI requires an enormous amount of context and KV cache. If LLMs constantly get amnesia, it’s hard to get anything done. Now we can add petabytes of storage for AI to remember EVERYTHING, and this changes tokenomics going forward. Special thanks to @AccBalanced for in-depth conversations about context memory storage. This article is not possible otherwise. Read all about it here: https://t.co/01lVSDrJqZ

16K

Ajay_sf retweeted

SemiAnalysis

@SemiAnalysis_

7 months ago

The economics of AI has been a big question mark in many investors' minds - What does the value chain look like? How do you model out the ROIC of AI? What would the ROIC look like? We built up an end-to-end economics stack to answer this question - how we go from a chip’s silicon cost, through full system integration, all the way down to the dollar cost per million inference tokens.(1/4)🧵

487

765

102K

Ajay_sf retweeted

Gavin Baker

@GavinSBaker

7 months ago

Yesterday, @RealJimChanos posited that Tesla’s relatively low capex meant that they were not a serious competitor in real world AI and Robotics. This is *exactly* the wrong way to look at it and the implications of this fact are actually positive for Tesla IMO. Tesla’s inference definitionally happens in the car so their customers are effectively paying for the inference compute “capex,” which is now probably the majority of hyperscaler capex spend. Tesla’s capex might be an order of magnitude higher if they had to synthetically generate relevant driving data in a datacenter. Customer subsidized vertical integration is beautiful. This is also why at some point Tesla customers will be able to put their cars into a pool of distributed edge compute and earn money when the car is not driving - same way that Akamai and Cloudflare are putting single GPUs in their edge nodes. The Tesla fleet as the world’s largest, most distributed CDN for AI (and only AI as obviously can’t cache content in cars) is a real possibility. BYD will have a similar opportunity and similar inference cost advantage. Beyond this significant inference cost advantage, Tesla has the second largest coherent Hopper cluster - behind only xAI - in the world for pre-training. You only need one coherent cluster *if* it is large enough. Coherent cluster size drives capital efficiency for pre-training. No one has been able to match the xAI and Tesla clusters from a coherence, speed and cost perspective with coherence being the most important. This is why Jensen described their datacenter design and execution as “superhuman.” Should note that Tesla also has an AI4 cluster for post-training or mid-training or whatever we are calling it these days. Tesla also has a significant data advantage for training Chinchilla optimal FSD models as real world video scales infinitely and this data advantage further lowers their capitalized training cost - less synthetic data generation and 3P data sourcing/labeling vs. labs training LLMs. This relative capital efficiency as a result of all these advantages - the largest coherent cluster, customers paying for inference, dataset size and ongoing data generation cost - is likely to matter vs. Robotics and FSD competitors who are less capital efficient. Cost per token is everything for AI. Google is the low cost producer of LLM tokens (with xAI as #2) but Tesla is the lowest cost producer of tokens that matter for FSD and Robotics. AI is the first time in my career that being the low cost producer has mattered as token quantity effectively drives quality in a reasoning world. I think this dynamic is very underappreciated by the market. Tesla might very well be outcompeted by an FSD competitor - unlikely from my perspective but anything is possible - but this will not happen because of their relative capex spend. If LLM inference happened at the edge on phones and PCs as with FSD, hyperscaler capex would be *much* lower. This is the real risk to datacenter spending, not all the value/macro takes. Btw - memory is the biggest winner in this scenario which is years out if scaling laws continue to hold. Jim is a smart guy but I humbly think his AI takes are misinformed. Also so strange to me that anyone is focused on AI as a bubble given the extremely obvious quantum and nuclear bubbles where there are loads of equities that can decline 99% and still be overvalued.

GavinSBaker's tweet photo. Yesterday, @RealJimChanos posited that Tesla’s relatively low capex meant that they were not a serious competitor in real world AI and Robotics.

This is *exactly* the wrong way to look at it and the implications of this fact are actually positive for Tesla IMO.

Tesla’s inference definitionally happens in the car so their customers are effectively paying for the inference compute “capex,” which is now probably the majority of hyperscaler capex spend.

Tesla’s capex might be an order of magnitude higher if they had to synthetically generate relevant driving data in a datacenter. Customer subsidized vertical integration is beautiful.

This is also why at some point Tesla customers will be able to put their cars into a pool of distributed edge compute and earn money when the car is not driving - same way that Akamai and Cloudflare are putting single GPUs in their edge nodes.

The Tesla fleet as the world’s largest, most distributed CDN for AI (and only AI as obviously can’t cache content in cars) is a real possibility. BYD will have a similar opportunity and similar inference cost advantage.

Beyond this significant inference cost advantage, Tesla has the second largest coherent Hopper cluster - behind only xAI - in the world for pre-training. You only need one coherent cluster *if* it is large enough. Coherent cluster size drives capital efficiency for pre-training.

No one has been able to match the xAI and Tesla clusters from a coherence, speed and cost perspective with coherence being the most important. This is why Jensen described their datacenter design and execution as “superhuman.” Should note that Tesla also has an AI4 cluster for post-training or mid-training or whatever we are calling it these days.

Tesla also has a significant data advantage for training Chinchilla optimal FSD models as real world video scales infinitely and this data advantage further lowers their capitalized training cost - less synthetic data generation and 3P data sourcing/labeling vs. labs training LLMs.

This relative capital efficiency as a result of all these advantages - the largest coherent cluster, customers paying for inference, dataset size and ongoing data generation cost - is likely to matter vs. Robotics and FSD competitors who are less capital efficient.

Cost per token is everything for AI. Google is the low cost producer of LLM tokens (with xAI as #2) but Tesla is the lowest cost producer of tokens that matter for FSD and Robotics.

AI is the first time in my career that being the low cost producer has mattered as token quantity effectively drives quality in a reasoning world. I think this dynamic is very underappreciated by the market.

Tesla might very well be outcompeted by an FSD competitor - unlikely from my perspective but anything is possible - but this will not happen because of their relative capex spend.

If LLM inference happened at the edge on phones and PCs as with FSD, hyperscaler capex would be *much* lower. This is the real risk to datacenter spending, not all the value/macro takes. Btw - memory is the biggest winner in this scenario which is years out if scaling laws continue to hold.

Jim is a smart guy but I humbly think his AI takes are misinformed.

Also so strange to me that anyone is focused on AI as a bubble given the extremely obvious quantum and nuclear bubbles where there are loads of equities that can decline 99% and still be overvalued.

410

245

758

457K

Ajay_sf retweeted

Austin Lyons

@austinsemis

8 months ago

“A lot of attention is given to compute, memory and networking in an AI data center. What gets less attention is the design of high capacity storage for AI workloads.”

austinsemis's tweet photo. “A lot of attention is given to compute, memory and networking in an AI data center.

What gets less attention is the design of high capacity storage for AI workloads.” https://t.co/4E5CVEuVKr

248

157

23K

Ajay_sf retweeted

8 months ago

Token prices ⬇️, infra bills ⬆️? The problem isn’t your #agenticAI — it’s inefficient design. Benchmarking on @CoreWeave show WEKA’s Augmented Memory Grid delivers ⚡4.2x capacity, ⏱️ 6x lower latency, 💰 lower costs. Learn more: https://t.co/0OAMn5vk6V

143

b/acc, context platform engineer

9 months ago

A visual guide to inference patterns of AI Agents. https://t.co/Oj0B0bNUp5

Ajay_sf retweeted

@AccBalanced

11 months ago

How to turn your ai infrastructure from a cost center, to a profit center It’s about leverage in your data infrastructure at the @weka storage and memory layers, to radically maximize token unit economics: https://t.co/DkS6YfKMvr

112

Ajay_sf retweeted

12 months ago

🚨 Live from #RAISESummit: WEKA unveils NeuralMesh Axon—breakthrough storage for exascale #AI. ⚡ 10x faster checkpointing ⚡ 20x faster time-to-first-token 📈 90%+ GPU utilization Built for LLMs, agentic AI & real-time inference. https://t.co/BwB1Nl9G37

Ajay_sf retweeted

Gavin Baker

@GavinSBaker

12 months ago

Given the massive - and increasing - importance of test-time compute and post-training RL shown by Grok-4’s absolute dominance, being the low cost producer of tokens is more important than ever. As an aside, this is the first time in my career as a tech investor that being the low cost producer of anything has mattered. Today, the lowest cost producers of tokens are Google (TPUs) and xAI (largest coherent cluster, lowest capex $ per deployed GPU, almost certainly highest MFU and have made some really smart architectural decisions). I am obviously biased when it comes to xAI. From a solely technical perspective, having the best scale-up networking and most efficient KV cache offload are most important to both cost and latency for the increasingly large models and context windows. These are the most important axes of competition in AI infrastructure today - not compute. Note that on-package memory bandwidth is most important when you can fit the model on a single chip (@cerebras) but for any really large model that requires multiple packages, scale-up and kv cache offload are most important. As everyone working on ASICs is slowly beginning to understand. This is why Dynamo and open-sourcing NVLink were both important and smart. The latter could increasingly lead to ASIC share migrating to NVLink partners. Not to mention the natural negotiating benefits of having a second supplier. Likely to see more of these IMHO:

GavinSBaker's tweet photo. Given the massive - and increasing - importance of test-time compute and post-training RL shown by Grok-4’s absolute dominance, being the low cost producer of tokens is more important than ever. As an aside, this is the first time in my career as a tech investor that being the low cost producer of anything has mattered.

Today, the lowest cost producers of tokens are Google (TPUs) and xAI (largest coherent cluster, lowest capex $ per deployed GPU, almost certainly highest MFU and have made some really smart architectural decisions). I am obviously biased when it comes to xAI.

From a solely technical perspective, having the best scale-up networking and most efficient KV cache offload are most important to both cost and latency for the increasingly large models and context windows. These are the most important axes of competition in AI infrastructure today - not compute. Note that on-package memory bandwidth is most important when you can fit the model on a single chip (@cerebras) but for any really large model that requires multiple packages, scale-up and kv cache offload are most important. As everyone working on ASICs is slowly beginning to understand.

This is why Dynamo and open-sourcing NVLink were both important and smart. The latter could increasingly lead to ASIC share migrating to NVLink partners. Not to mention the natural negotiating benefits of having a second supplier. Likely to see more of these IMHO:

595

488

162K

Ajay_sf retweeted

Chris_Mellor

@Chris_Mellor

12 months ago

NeuralMesh Axon - WEKA ports NeuralMesh to a GPU server’s local SSDs - https://t.co/sG6PyiI9Ae

290

12 months ago

NAND Research looks at WEKA's NeuralMesh, a new AI-native storage architecture built to address the performance, elasticity, and latency demands of real-time inference and agentic AI: https://t.co/HJZg68tqFk

about 1 year ago

WEKA just open-sourced the GPUDirect Storage (GDS) integration from its Augmented Memory Grid - now available for the vLLM and LMCache frameworks. The combination lets you cut TTFT by 20x, and extend KV Cache TTL from an hour to weeks. WEKA is excited to share this with the open-source community and would love your feedback. Join the conversation in the new hashtag#WEKA-GDS-Integration channel in the vLLM Slack. https://t.co/YQYIlgVwNW

about 1 year ago

Oracle Cloud (OCI) saw TTFT for Llama3.1-70B drop from 39 seconds to 2 seconds by using Augmented Memory Grid to extend KV Cache. https://t.co/I07TNDu9GM

about 1 year ago

Frustrated with LLM inference latency and token efficiency? Here's a way to dramatically speed up inference using a KV Cache extension - -Speeds up Time To First Token by 20x -Also allows significantly higher token throughput per GPU https://t.co/ZfBRr3offS

Ajay_sf retweeted