Suvinay Subramanian

@suvinay

👨‍💻 Building AI systems (TPUs) @google | 🎓 @MIT_CSAIL (Ph.D.), @iitmadras (BTech) | 🎙️ Co-host the Computer Architecture Podcast | Views my own

California, USA

Joined November 2008

213 Following

256 Followers

1.2K Posts

Pinned Tweet

Suvinay Subramanian @suvinay

about 1 year ago

Starting with this exciting line of work from @jintian and colleagues at MIT. We tackle the question of: Can we train LLMs to parallelize autoregressive decoding automatically, backed by a performant runtime to exploit this parallelism for improved inference speedup?

Tian Jin

@jintian

over 1 year ago

Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.

19K

405

Suvinay Subramanian @suvinay

11 months ago

A short article on our #ICML2025 paper (led by @jintian, @ellieyhc MITxGoogle): PASTA teaches LLMs to adaptively parallelize their own decode, optimizing quality & latency in concert. No hand-crafted heuristics -> learned parallelism, with realized latency improvements on GPUs.

Tian Jin

@jintian

11 months ago

Asynchronous decoding: multiple LLM threads write different parts of an answer in parallel. In Feb we (MIT×Google) introduced PASTA—the first async-dec method that uses policy learning to optimize latency & quality end-to-end. See us @ E-2600, East Hall A-B, Tue 11pm #ICML.

404

Suvinay Subramanian @suvinay

about 1 year ago

Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @jintian summarizing our findings.

Tian Jin

@jintian

about 1 year ago

📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N

jintian's tweet photo. 📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N https://t.co/GzbZV3kvfu

19K

575

Suvinay Subramanian @suvinay

about 1 year ago

TPUs have been a key enabler for the Gemini models -- from large-scale training, to fast and cost-effective serving. Our latest generation TPUs (Ironwood) will bring more exciting compute capabilities to the fore: https://t.co/uillylLyT1

Alberto Romero

@Alber_RomGar

about 1 year ago

Breaking news: Google is winning on every AI front. This is not just about Gemini 2.5 but about a reality that OpenAI and Anthropic fans have ignored for too long. Here's a non-exhaustive list: - Gemini 2.5 Pro is the best model in the world according to benchmarks, vibe checks, high-taste testers, and firsthand testimonies. It's also fast and cheap compared to similar models (Google offers it for free on the Gemini app!) - Gemini 2.5 Flash (to be announced soon) is much faster and much cheaper, so it captures perfectly the Pareto frontier of cost-performance of cost-efficient models. - Gemma 3 is a highly competitive open-source model, as good or better than Llama 4 and DeepSeek models. - That's just LLMs. Google is world-class in image (Imagen 3), video (Veo 2), voice (Chirp 3), and music (Lyria). They're integrating them all in Vertex AI. - Deep Research with Gemini 2.5 Pro is *twice as good* as OpenAI's Deep Research, according to human testers. Other agents? Yes: Project Astra (assistant) and Project Mariner (computer interaction) - They just launched Agent2Agent, compatible and complementary to Anthropic's MCP, which they will build in-house as well. - And they keep publishing papers in top journals (Nature) and going to the top conferences (ICLR, NeurIPS), whereas others jealously keep their most important stuff for themselves. - That's just the AI stuff, but Google is also a consumer software company with seven 2+ billion monthly users: Search, YouTube, Gmail, Android, Chrome, Maps, and Play Store - A hyperscaler (Google Cloud) - A hardware company (TPUs, Ironwood) - And a phone company (Pixel). How can OpenAI or Anthropic or even Meta fight such a beast? Let’s wait for their responses to this. I’ll be here to cover any newsworthy release—even if I’ve already made my bet on who’s most likely to win. (Read the full post in the link below.)

700

274

175K

376

Who to follow

layla

@pilatesdev

prev. fullstack swe + devrel at Oracle. looking for my next devrel role ❤️ mom x2 🚼 opinions = my own

Otavio Santana

@otaviojava

Empowering software engineers to be more productive and effective to reach the ultimate stage of sophistication. Java, NoSQL, Software Design, & Architecture.

Dinesh Jayaraman

@dineshjayaraman

Assistant Professor at University of Pennsylvania. Robot Learning. https://t.co/cIMw5XKSPy

Suvinay Subramanian @suvinay

about 1 year ago

@SabaMugazambi @JeffDean @NormJouppi And finally, for those interested in more technical details, and codesign across multiple layers of the stack from hardware, circuits to software and all the way up to the datacenter: https://t.co/gVuV978G6U

150

Suvinay Subramanian @suvinay

about 1 year ago

@Google announced the latest generation of our AI supercomputers (TPUs) -- Ironwood -- this week. Check out the blogpost in quote for the highlights. https://t.co/TMns1HbEkA Pointers to deep-dives and more technical details in thread. [contd...👇]

suvinay's tweet photo. @Google announced the latest generation of our AI supercomputers (TPUs) -- Ironwood -- this week. Check out the blogpost in quote for the highlights. https://t.co/TMns1HbEkA

Pointers to deep-dives and more technical details in thread. [contd...👇] https://t.co/K70ae3VUi6

124

Suvinay Subramanian @suvinay

about 1 year ago

@SabaMugazambi @JeffDean @NormJouppi A couple of fun videos that provide a sneak peek into TPUs and how they are plugged into our datacenters: [1] https://t.co/V43HD2SKad, [2] https://t.co/7RoiKy59WZ

167

Suvinay Subramanian @suvinay

about 1 year ago

Together with Lisa Hsu (Meta), we have been hosting the Computer Architecture Podcast -- we recently crossed 50K downloads. Check out our latest episode with Prof. Arka Basu: https://t.co/RRg6gq7ZMa -- we discuss GPUs, but a different vantage point than AI which is all the rage.

221

Suvinay Subramanian @suvinay

about 1 year ago

A couple of excellent resources on how to think about AI systems performance, parallelism, and scaling. The below is from colleagues at Google and focused on TPUs. Another resource that dropped in the past-month is the Ultra-scale Playbook from HF: https://t.co/xuE8SZIPzu.

Jacob Austin @jacobaustin132

over 1 year ago

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

jacobaustin132's tweet photo. Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n https://t.co/jnb5kTLD5V

391

465K

138

Suvinay Subramanian @suvinay

about 1 year ago

In addition to the ArchReasoning Challenge, please subimit your work at the intersection of ML, Computer Architecture and Systems to the MLArchSys Workshop at ISCA'25 (Tokyo). CFP and topics in-quote.

Amir Yazdan

@ayzddzya

over 1 year ago

Please consider submitting your best work. MLArchSys is the best place to showcase your work at the intersection of ML, Computer Architecture, and System. Check out the call for paper and look for new topics we included this year 🚀🔥 https://t.co/QiYsSFVcny 1/3

303

Suvinay Subramanian @suvinay

about 1 year ago

High-quality data is a key enabler for effective, useful, and actionable use of AI. We are working towards collecting and curating such a dataset for the computer architecture domain. Submit your favorite architecture qns to ArchReasoning Challenge (https://t.co/5JG0DUOpta).

Amir Yazdan

@ayzddzya

about 1 year ago

We're excited to launch the 𝐀𝐫𝐜𝐡𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 (https://t.co/SS4EuHt5wA). Design complex, reasoning-based questions that expose the current limitations of LLMs and contribute to the broader effort of improving AI reasoning for comp. arch. and systems.

176

Suvinay Subramanian @suvinay

about 1 year ago

Returning to Twitter/X after a decade hiatus. My excellent intern(s) at Google with whom I have had the pleasure of working, were kind enough to nudge me to help signal-boost their work. Will also try to share updates on TPUs, AI chips & systems, and computer architecture.

251

Suvinay Subramanian @suvinay

almost 11 years ago

[1/3] Former President of India, A.P.J.Abdul Kalam passes away. Dr.Abdul Kalam was a rare individual -- a man of intellectual brilliance,

Suvinay Subramanian @suvinay

almost 11 years ago

[2/3] great scientific zeal, a sagacious statesman and truly a people's President. While many will remember him for spearheading India's

Suvinay Subramanian @suvinay

almost 11 years ago

[3/3] nuclear program, his true legacy will be inspiring several generations of Indians to dream & to work towards a brighter India. RIP.

Suvinay Subramanian @suvinay

almost 11 years ago

Science professors need leadership training. http://t.co/zjrEeylpsK Couldn't agree more. #read

Suvinay Subramanian @suvinay

almost 11 years ago

In defense of Millennials: http://t.co/ndnAf9VtBF

Suvinay Subramanian @suvinay

almost 11 years ago

Two excellent (but unrelated) graphs a) On global warming http://t.co/HSGivx2z4N b) Evolution of Silicon Valley firms http://t.co/y3y1AzDpXf

Suvinay Subramanian

@suvinay

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users