Graham Steele @grahamrsteele - Twitter Profile

Graham Steele @GrahamRSteele

about 1 month ago

Link to Blog: https://t.co/3WiRKCQros

1

0

13

Graham Steele @GrahamRSteele

about 1 month ago

What does it actually cost to run an agent? We traced a Claude Code session: 283 inference requests in 33 mins, context peaking past 150K tokens. The economics break under conventional serving. New blog w/ Eduardo Alvarez and @benklieger on what fixing it takes 👇

GrahamRSteele's tweet photo. What does it actually cost to run an agent?
We traced a Claude Code session: 283 inference requests in 33 mins, context peaking past 150K tokens.
The economics break under conventional serving.

New blog w/ Eduardo Alvarez and @benklieger on what fixing it takes 👇 https://t.co/diSNvijKPp

NVIDIA AI

@NVIDIAAI

about 1 month ago

What does it actually take to run agentic workloads at scale? ⚡Agents push token consumption, context length, and latency into extremely demanding regions. Extreme co-design on the Vera Rubin platform is built for these complex workloads, delivering 400+ tokens/sec/user on trillion-parameter MoE models. Tech blog ➡️ https://t.co/DIxW96omML

NVIDIAAI's tweet photo. What does it actually take to run agentic workloads at scale?

⚡Agents push token consumption, context length, and latency into extremely demanding regions. Extreme co-design on the Vera Rubin platform is built for these complex workloads, delivering 400+ tokens/sec/user on trillion-parameter MoE models.

Tech blog ➡️ https://t.co/DIxW96omML

15

151

21

37

35K

2

4

1

0

333

GrahamRSteele retweeted

NVIDIA AI Infrastructure

@NVIDIAAIInfra

about 2 months ago

Evaluating AI inference TCO? Look beyond compute costs and evaluate cost per token which reflects end-to-end system performance and actual utilization across the entire AI factory—spanning GPUs, CPUs, storage, networking, software, and more. Lowest cost per token isn’t achieved by optimizing peak chip specs alone. It’s the result of deep, end-to-end co-design with our partners including @CoreWeave, @Nebiusai, @Nscale, and @togethercompute across the full stack.

NVIDIAAIInfra's tweet photo. Evaluating AI inference TCO?

Look beyond compute costs and evaluate cost per token which reflects end-to-end system performance and actual utilization across the entire AI factory—spanning GPUs, CPUs, storage, networking, software, and more.

Lowest cost per token isn’t achieved by optimizing peak chip specs alone.

It’s the result of deep, end-to-end co-design with our partners including @CoreWeave, @Nebiusai, @Nscale, and @togethercompute across the full stack.

6

33

11

3

2K

Graham Steele @GrahamRSteele

10 months ago

Prompt caching is rolling out today on @GroqInc starting first with Kimi K2 🚀 What will you buy with all the money you save?

Groq Inc

@GroqInc

10 months ago

🚨Today we’re rolling out Prompt Caching on GroqCloud. Keep hot prompts in memory, cut cached token costs by 50% and slash latency. Faster response, smarter inference. Learn more 👇

23

236

23

56

20K

0

20

3

1

3K

Who to follow

I alone stand in a social coma

Graham Steele @GrahamRSteele

10 months ago

From 0 -> 2M devs in 17 months 📈 and we're just getting started! Thank you to the entire @GroqInc community for being apart of the journey 🎉

Groq Inc

@GroqInc

10 months ago

Groq just hit 2M devs. Threw a party…

38

794

59

91

654K

1

30

0

2

992

Graham Steele @GrahamRSteele

10 months ago

Groq Code CLI is here 🚢 Shout out to @lee_x64 for putting this together!

Lee @lee_x64

10 months ago

Introducing: Groq Code CLI “Hold on!”, you say. Coding CLIs are everywhere. The Groq Code CLI is different. It is a template and building block for developers looking to customize and extend a CLI to be entirely their own. Leading open-source CLIs are all fantastic yet gigantic. Feature-rich: yes, but local development with such a large codebase can be unfriendly and overwhelming. This is a project for those developers looking to dive in. Link to the code: https://t.co/cKdIJXm8aS And yes, it created that in 14.6s on Kimi K2. @GroqInc

34

552

51

517

113K

0

3

1

0

174

Graham Steele @GrahamRSteele

10 months ago

🔥Hats off to @_p0lymath_ for putting this together! Open-source repo coming soon

Groq Inc

@GroqInc

10 months ago

Meet your full-stack AI assistant, powered by OpenAI’s new open gpt-oss models (20B + 120B) and Groq. Includes search, code execution, STT/TTs, code-gen editor, AI notes, and custom tool calling. Build Fast.

2

25

2

7

2K

0

1

0

60

GrahamRSteele retweeted

Groq Inc

@GroqInc

10 months ago

OpenAI’s open models are live and already running on Groq. Try gpt-oss-20B and gpt-oss-120B today. Groq delivers 128K context and built-in tools such as code execution and browser search. For the first time, developers and enterprises can deploy open models backed by OpenAI instantly, anywhere, at scale. Start building now. Links in comments.

59

2K

162

351

2M

Graham Steele @GrahamRSteele

10 months ago

RT @mattshumer_: It's over. We have an o3-level open-source model running on @GroqInc at 500 tokens per second. Watch it build an entire…

0

5

0

15

GrahamRSteele retweeted

Stuart Pitts @stpitts

10 months ago

Two PhD students build a voice agent from scratch—and take on a $350B industry. Enter @GroqInc: the catalyst. I can’t get enough of real stories like this one. Congrats to the @phonely_ai team on the beginnings of your journey. Onward.

1

16

2

3

1K

Graham Steele @GrahamRSteele

about 1 year ago

Gearing up for RAISE your HACK in Paris this July with @ozenhati. Sign up today to take part in the world's largest AI hackathon. Details below 👇

lablab.ai

@lablabai

about 1 year ago

⚡ @GroqInc powers up RAISE your HACK - the World’s Largest AI Hackathon, giving you direct access to their groundbreaking LPU architecture for blazing-fast LLM performance! Prepare to experience inference at an entirely new level. And guess what? They’re bringing… quite a prize pool to the table!

lablabai's tweet photo. ⚡ @GroqInc powers up RAISE your HACK - the World’s Largest AI Hackathon, giving you direct access to their groundbreaking LPU architecture for blazing-fast LLM performance! Prepare to experience inference at an entirely new level.
And guess what? They’re bringing… quite a prize pool to the table!

1

4

0

1

1K

0

4

1

0

244

Graham Steele @GrahamRSteele

about 1 year ago

@ycombinator @youlearnai @1davidyu1 @achyut_benz Congrats @KapadiaSoami

0

2

0

329

Graham Steele @GrahamRSteele

about 1 year ago

Great Llama 4 use case created by @_p0lymath_ 🔥

Groq Inc

@GroqInc

about 1 year ago

Ready to build with the official Llama API? Groq's got you covered 👀👇 Demo from @_p0lymath_ and access link in 🧵

3

42

4

7K

0

5

0

214

Graham Steele @GrahamRSteele

about 1 year ago

We are posted up at UCLA this weekend for @LAHacks. Stop by the @GroqInc table for increased rate limits and free SWAG. Build fast 🚀

0

15

1

0

243

GrahamRSteele retweeted

Chris Ho @officialchrisho

about 1 year ago

👉 In case you missed it, @GroqInc just released Compound Beta, their first compound AI system, built by @benklieger! Here is an open sourced demo with voice-in capability to kickstart your development work. Link in the comments below! 🔥

3

21

3

4

4K

Graham Steele @GrahamRSteele

about 1 year ago

Incredible multilingual voice demo by @elevenlabs leveraging Llama 4 Scout -powered by @GroqInc

Hatice Ozen

@ozenhati

about 1 year ago

llama 4 scout on @groqinc paired with @elevenlabs is incredible for multilingual voice agents. insanely smooth even switching between different languages thanks to low latency. and for those who have been asking about its turkish - i've been testing and it's pretty good. :)

1

82

4

28

14K

0

5

0

156

Graham Steele @GrahamRSteele

about 1 year ago

@wacheeeee @GroqInc @metaai Can't wait to see what you build man!

0

28

Graham Steele @GrahamRSteele

about 1 year ago

Now your speech to speech apps can analyze images at @GroqInc speed with @metaai's Llama 4 Scout

Chris Ho @officialchrisho

about 1 year ago

🔊🔛🔥@GroqInc just dropped support for @MetaAI's Llama 4 🚀 But what if you want to have real-time dynamic conversation with your images? Now you can thanks to STT, LLM, Image-to-Text, and TTS models powered by Groq and the @LiveKit SDK. Repo and details below 👇

11

209

32

157

23K

1

4

0

274

GrahamRSteele retweeted

Ashpreet Bedi

@ashpreetbedi

about 1 year ago

Llama4 + MCP find me an Airbnb 🦙✨ - Giving Llama4 a "thinking" scratchpad is a superpower - @GroqInc is so fast that it makes the additional "think" step tolerable. I love everything about this! Full code below 👇

4

196

23

180

14K

Graham Steele

@GrahamRSteele

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users