Ryan Pollock

@ryanpollock

🤖 AI Inference & Infrastructure Exec ✨ Currently VP Marketing & GTM Strategy @ FriendliAI 🚀 Prev @ Together AI, Google Cloud, Oracle, Vultr, DigitalOcean 🌥

San Francisco, CA

Joined July 2006

679 Following

557 Followers

1.2K Posts

Ryan Pollock @ryanpollock

14 days ago

Two frontiers in production AI: 🧠 capability (which model) and ⚡ inference performance (throughput × latency). Picking the model is easy compared with finding your operating point on the second. 📘 Friendli Guide to Inference Performance Optimization — out now. ⤵️

friendliai

@friendliai

14 days ago

More GPUs won't fix your latency. But methodology will — and that's why the Friendli Guide to Inference Performance Optimization is out now. 📘 Most teams over-provision by 30–40%, miss SLA anyway, and conclude they need a bigger cluster. The bottleneck isn't capacity — it's the lack of a framework tying sizing, benchmarking, and SLA targets together. The 4 steps inside: • 📐 Size capacity correctly — model weights, KV cache, and headroom • 📊 Benchmark against realistic concurrency, not synthetic prompts • 🔍 Diagnose the real bottleneck — TTFT vs TPOT, compute-bound vs memory-bandwidth-bound • 🎯 Pick an operating point on the Pareto frontier that hits your SLA at the lowest $/M tokens Free, no gate beyond an email. Link in comment. 👇

friendliai's tweet photo. More GPUs won't fix your latency.

But methodology will — and that's why the Friendli Guide to Inference Performance Optimization is out now. 📘

Most teams over-provision by 30–40%, miss SLA anyway, and conclude they need a bigger cluster. The bottleneck isn't capacity — it's the lack of a framework tying sizing, benchmarking, and SLA targets together.

The 4 steps inside:
• 📐 Size capacity correctly — model weights, KV cache, and headroom
• 📊 Benchmark against realistic concurrency, not synthetic prompts
• 🔍 Diagnose the real bottleneck — TTFT vs TPOT, compute-bound vs memory-bandwidth-bound
• 🎯 Pick an operating point on the Pareto frontier that hits your SLA at the lowest $/M tokens

Free, no gate beyond an email. Link in comment. 👇

184

Ryan Pollock @ryanpollock

15 days ago

I've been bullish on open source aka open-weight models for years. And I think @OpenRouter's explosive growth is proof that open models were always destined to become economical reasonable alternative to closed APIs. It's just taken a bit longer than I thought to get there.

friendliai

@friendliai

15 days ago

🏒 @OpenRouter's $113M raise — alongside growth from 5T to 25T weekly tokens in just 6 months — validates a major shift happening in AI infrastructure. Open-weight models are rapidly becoming a credible, economical alternative to closed-model APIs, especially as AI agents begin generating massive volumes of inference traffic. As part of the OpenRouter ecosystem, FriendliAI has consistently ranked among the top-performing providers for models like GLM‑5.1 and DeepSeek V3.2 across: ⚡ Throughput ⏱️ Latency / TTFT 🛡️ Reliability 🧠 Tool-calling performance Models are increasingly portable. Inference quality is becoming the differentiator. Congrats to the OpenRouter team 🥳 And see the link in the comments to try Friendli Model APIs on OpenRouter.

friendliai's tweet photo. 🏒 @OpenRouter's $113M raise — alongside growth from 5T to 25T weekly tokens in just 6 months — validates a major shift happening in AI infrastructure.

Open-weight models are rapidly becoming a credible, economical alternative to closed-model APIs, especially as AI agents begin generating massive volumes of inference traffic.

As part of the OpenRouter ecosystem, FriendliAI has consistently ranked among the top-performing providers for models like GLM‑5.1 and DeepSeek V3.2 across:
⚡ Throughput
⏱️ Latency / TTFT
🛡️ Reliability
🧠 Tool-calling performance

Models are increasingly portable.

Inference quality is becoming the differentiator.

Congrats to the OpenRouter team 🥳

And see the link in the comments to try Friendli Model APIs on OpenRouter.

245

ryanpollock retweeted

friendliai

@friendliai

3 months ago

Introducing Friendli InferenceSense™: the "AdSense for GPUs." 🏭💸 InferenceSense helps GPU cloud operators automatically fill idle compute cycles with paid AI inference workloads, just as AdSense helps digital publishers fill empty website space with ads to generate revenue. 🪙💰 Modern data centers are often portrayed as AI factories. Yet, most GPU clouds are still missing the crucial inference "assembly line" that produces intelligence—turning raw compute into generated tokens and revenue. When bursty training jobs finish, expensive hardware simply goes dark—but the massive costs of power, cooling, and depreciation never stop. Today, we are thrilled to officially launch the industry’s first inference monetization platform purpose-built to fix this: Friendli InferenceSense™. Powered by the highly optimized engine built by the inventors of continuous batching, InferenceSense automatically detects idle GPU capacity in your fleet and instantly fills it with paid inference requests for popular open-weight models. We bring the global demand; you simply plug in and earn. "Most GPU operators still act like traditional landlords, watching revenue evaporate every time a workload finishes or a contract ends," says FriendliAI CEO, @bgchun. "InferenceSense provides the missing assembly line. Every idle GPU-hour becomes a chance to serve real AI demand and capture token revenue. The AI factory build-out only makes sense when it actually makes cents." Why GPU clouds choose InferenceSense: 📈 Monetize Underutilized Infrastructure: Stop losing margin on dark hardware and transform idle compute cycles into an active, revenue-generating asset that can even surpass traditional rental revenue. 🔒 Zero Disruption: Your jobs ALWAYS come first. Immediate preemption guarantees zero downtime for your core workloads. ⚙️ Frictionless Integration: You retain full control over participating nodes and schedules, with no upfront costs or minimum commitments. Heading to NVIDIA GTC? We are currently accepting applications from qualified GPU cloud operators. 📰 Read the full blog here https://t.co/iuv4i80GHS 📩 Contact [email protected] to schedule an executive briefing with us at GTC #InferenceSense #NVIDIAGTC

friendliai's tweet photo. Introducing Friendli InferenceSense™: the "AdSense for GPUs." 🏭💸

InferenceSense helps GPU cloud operators automatically fill idle compute cycles with paid AI inference workloads, just as AdSense helps digital publishers fill empty website space with ads to generate revenue. 🪙💰

Modern data centers are often portrayed as AI factories. Yet, most GPU clouds are still missing the crucial inference "assembly line" that produces intelligence—turning raw compute into generated tokens and revenue. When bursty training jobs finish, expensive hardware simply goes dark—but the massive costs of power, cooling, and depreciation never stop.

Today, we are thrilled to officially launch the industry’s first inference monetization platform purpose-built to fix this: Friendli InferenceSense™.

Powered by the highly optimized engine built by the inventors of continuous batching, InferenceSense automatically detects idle GPU capacity in your fleet and instantly fills it with paid inference requests for popular open-weight models. We bring the global demand; you simply plug in and earn.

"Most GPU operators still act like traditional landlords, watching revenue evaporate every time a workload finishes or a contract ends," says FriendliAI CEO, @bgchun. "InferenceSense provides the missing assembly line. Every idle GPU-hour becomes a chance to serve real AI demand and capture token revenue. The AI factory build-out only makes sense when it actually makes cents."

Why GPU clouds choose InferenceSense:
📈 Monetize Underutilized Infrastructure: Stop losing margin on dark hardware and transform idle compute cycles into an active, revenue-generating asset that can even surpass traditional rental revenue.
🔒 Zero Disruption: Your jobs ALWAYS come first. Immediate preemption guarantees zero downtime for your core workloads.
⚙️ Frictionless Integration: You retain full control over participating nodes and schedules, with no upfront costs or minimum commitments.

Heading to NVIDIA GTC? We are currently accepting applications from qualified GPU cloud operators.

📰 Read the full blog here https://t.co/iuv4i80GHS
📩 Contact partners@friendli.ai to schedule an executive briefing with us at GTC

#InferenceSense #NVIDIAGTC

136

ryanpollock retweeted

Cartesia

@cartesia

8 months ago

New model just dropped 👀

293

119

67K

Who to follow

Ahmed

@A_Ghadani

Co-founder & CEO @byanat_ai

Vito Peleg

@VitoPeleg

Co-Founder & CEO @Atarim_io We built the first agentic creative team. I share my journey, from ex-rockstar & agency owner to scaling an AI SaaS 🤘🤓

Behzad Haghgoo

@Behzadhaghgoo

Building something new | prev: AI @Roblox, co-founder @tymcollab (acq), @stanford

ryanpollock retweeted

Together AI @togethercompute

over 1 year ago

🚀 Together AI Achieves 90% Faster BF16 Training with NVIDIA Blackwell Platform and Together Kernel Collection Today we are announcing immediate access to Together GPU Clusters accelerated by the NVIDIA Blackwell platform, with an AI acceleration stack optimized for the latest GPU architecture. Together GPU Clusters featuring NVIDIA HGX B200 deliver 90% faster training than NVIDIA HGX H100, powered by Together Kernel Collection to optimize performance at scale. 🏎 Test Drive NVIDIA Blackwell Platform – Apply by February 26. Work with NVIDIA and Together AI researchers to optimize performance. 📢 Apply for a free test drive today → https://t.co/at132RN5zD 📖 Read more in the full blog → https://t.co/GQqOkrJSFk

togethercompute's tweet photo. 🚀 Together AI Achieves 90% Faster BF16 Training with NVIDIA Blackwell Platform and Together Kernel Collection

Today we are announcing immediate access to Together GPU Clusters accelerated by the NVIDIA Blackwell platform, with an AI acceleration stack optimized for the latest GPU architecture.

Together GPU Clusters featuring NVIDIA HGX B200 deliver 90% faster training than NVIDIA HGX H100, powered by Together Kernel Collection to optimize performance at scale.

🏎 Test Drive NVIDIA Blackwell Platform – Apply by February 26. Work with NVIDIA and Together AI researchers to optimize performance.

📢 Apply for a free test drive today → https://t.co/at132RN5zD

📖 Read more in the full blog → https://t.co/GQqOkrJSFk

23K

ryanpollock retweeted

Together AI @togethercompute

over 1 year ago

🚀 DeepSeek-R1 now available on Together AI! DeepSeek-R1 sets a new standard for open-source reasoning models, rivaling OpenAI-o1. It’s faster than closed-source alternatives, delivers the same top-tier performance across math, code, and reasoning tasks, and operates at a fraction of the cost. Now available on Together AI with: 🔒 Opt-out privacy controls 📃 Full 160K context ⚡ Deploy seamlessly on serverless, dedicated endpoints, or enterprise environments 🌊 Big kudos to @deepseek_ai for continuing to make waves in the open-source community 🙌

$togethercompute's tweet photo. 🚀 DeepSeek-R1 now available on Together AI! DeepSeek-R1 sets a new standard for open-source reasoning models, rivaling OpenAI-o1. It’s faster than closed-source alternatives, delivers the same top-tier performance across math, code, and reasoning tasks, and operates at a fraction of the cost. Now available on Together AI with: 🔒 Opt-out privacy controls 📃 Full 160K context ⚡ Deploy seamlessly on serverless, dedicated endpoints, or enterprise environments 🌊 Big kudos to @deepseek_ai for continuing to make waves in the open-source community 🙌$

509

182

136K

ryanpollock retweeted

Together AI @togethercompute

over 1 year ago

1/ 🚀 Big news from Together AI: We’re co-building a massive @nvidia GB200 Cluster with 36K Blackwell GPUs! Read more👇 https://t.co/MHDAX0mvjq

togethercompute's tweet photo. 1/ 🚀 Big news from Together AI: We’re co-building a massive @nvidia GB200 Cluster with 36K Blackwell GPUs!

Read more👇

https://t.co/MHDAX0mvjq https://t.co/lHUSZsnWLw

123

29K

Ryan Pollock @ryanpollock

over 1 year ago

I've been reading @tldrnewsletter 📰 for years to keep up with tech And what's the top story in today's TLDR AI? 🤖 🦙 @togethercompute AI has partnered with @AIatMeta to offer a FREE API for the new Llama 3.2 vision model 😲 Try it for free 👇 https://t.co/8X9HCiG03j

ryanpollock retweeted

Vipul Ved Prakash

@vipulved

over 1 year ago

Together API now provides _free_ access to Llama 3.2 11B Vision model. Since we founded Together AI, I've wanted to offer to a great foundation model for free, so creative developers everywhere can build AI enabled apps without cost. We are able to do this now. Thank you to @AIatMeta for a great open model capable of language and vision.

ryanpollock retweeted

Together AI @togethercompute

over 1 year ago

🚀 Big news! We’re thrilled to announce the launch of Llama 3.2 Vision Models & Llama Stack on Together AI. 🎉 Free access to Llama 3.2 Vision Model for developers to build and innovate with open source AI. https://t.co/FVlp2euqrC ➡️ Learn more in the blog https://t.co/kUxdw5VUC3

togethercompute's tweet photo. 🚀 Big news! We’re thrilled to announce the launch of Llama 3.2 Vision Models & Llama Stack on Together AI.

🎉 Free access to Llama 3.2 Vision Model for developers to build and innovate with open source AI. https://t.co/FVlp2euqrC

➡️ Learn more in the blog https://t.co/kUxdw5VUC3

252

60K

ryanpollock retweeted

Entail AI @entail_ai

over 2 years ago

How do you stay ahead when Gen AI is changing the way we approach marketing every day? @ryanpollock shares key insights in the latest episode of The Entail AI Podcast: https://t.co/oP5gMnDjnE #GenerativeAI #MarketingInsights #AIinMarketing

119

Ryan Pollock @ryanpollock

over 2 years ago · San Francisco

@sama Just give me a robot butler and we’re good.

Ryan Pollock @ryanpollock

over 2 years ago

@sama 🍿

Ryan Pollock @ryanpollock

almost 3 years ago

@levie Funny, just yesterday I talked with someone who has exactly the technology necessary to underpin this.

Ryan Pollock @ryanpollock

almost 3 years ago · San Francisco

💯

Greg Kogan

@gregkogan

almost 3 years ago

Don’t confuse PLG with “SMB.” You can still target and land huge customers.

299

177

Ryan Pollock @ryanpollock

over 3 years ago

@jeffboudier Nice 🤗 Pls look for my email, hope you agree 😉

Ryan Pollock @ryanpollock

over 3 years ago · San Francisco

Great #generativeai meetup @awscloud featuring @jeffboudier from 🤗 @HuggingFace and @OpenAI remind me very much of Android and iOS, respectively. And now is the time for both platforms to suit up and start selling...while holding on to what made them great 😉

ryanpollock's tweet photo. Great #generativeai meetup @awscloud featuring @jeffboudier from 🤗

@HuggingFace and @OpenAI remind me very much of Android and iOS, respectively.

And now is the time for both platforms to suit up and start selling...while holding on to what made them great 😉 https://t.co/HNSstOcHUy

Ryan Pollock @ryanpollock

over 3 years ago

Big moment @OpenAI In addition to these new APIs - their home page and website structure now reflect a company that is actually trying to market and sell their platform. Well done @sama https://t.co/btTdyQVWj5

170

Ryan Pollock @ryanpollock

over 3 years ago

I’m @aicamp and my short answer to the question posed by @cohere’s Hemant Jain: a lot. #aicamp #nlp #llm

451

Ryan Pollock @ryanpollock

over 3 years ago

Automated summarization of what’s going on within a business in Slack, JIRA, Git etc is about to be super useful new product area that anyone in an exec or product role will want. Slack in particular needs summaries badly. https://t.co/nYgwaDjbey

120

Ryan Pollock

@ryanpollock

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users