🚀 World record performance: SambaNova is running Llama 3.1 405B at 114 t/s with full precision accuracy, in only one rack. Verified by @ArtificialAnlys! 🦙
This speed unlocks so many use cases for enterprises and developers that we cannot wait to see them built on our platform.
Apply for early access today: https://t.co/CSlbJbTFVj
At @LipBuTan1's #COMPUTEX2026 keynote today, @RodrigoLiang stepped onstage with @RFS_Vista to power up the world's first disaggregated inference cloud, VectorCore Compute (VC2), launched by @Vista_Equity and Cambium Capital.
Three chips ran disaggregated inference, live from the VC2 datacenter in LA:
➡️ NVIDIA B200 GPUs — prefill, high-compute burst
➡️ SambaNova RDUs — decode, high-throughput, low-latency token generation at scale
➡️ Intel® Xeon® 6 CPUs — tool execution, end-to-end orchestration
“GPUs are powerful. RDUs are fast. CPUs orchestrate. But disaggregate all three — and you get speed, performance, and economics no single chip can touch. That's the unlock.” – @RodrigoLiang
You know we always come through with the speed 🚀
@AIatMeta's #Llama4 Maverick is now available on SambaNova Cloud with the FASTEST inference!
✈️ 655 t/s — independently verified by @ArtificialAnlys
Comparing DeepSeek V3-0324 APIs: We are now tracking 10 APIs for DeepSeek’s new model, including DeepSeek’s first-party API and offerings from Fireworks, DeepInfra, Hyperbolic, Nebius, CentML, https://t.co/VHqKQuuZfq, Novita, Replicate and SambaNova
DeepSeek V3-0324 is open weights and there is now a healthy ecosystem of providers offering APIs! Congrats to all the inference providers for making it rapidly available. Given V3-0324’s status as the leading non-reasoning model, we will not be surprised to see more providers make this model available in the coming days.
Key info from our benchmarking:
➤ We’re seeing the fastest output speeds on SambaNova (267 tokens/s), Fireworks (82 tokens/s) and CentML (76 tokens/s)
➤ We’re seeing the best prices on DeepSeek’s own API ($0.27/$1.1 per million input/output tokens), followed closely by DeepInfra ($0.4/0.89)
➤ Not all providers are supporting V3-0324’s full 128K context window: notably, DeepSeek’s own API only supports a 64K token context window (as does Novita’s API) and SambaNova currently only supports an 8K token context window
DeepSeek V3 and R1 are a particularly hard models to host compared to most other open weights models because they are so large - at 671B total parameters, they cannot fit on a single 8xH100 node. It can be very expensive for providers to offer APIs at small scale.
SN40L crushes H200 in real-world #AI inference! 🦾
We measured @deepseek_ai's-R1 with SGLang 0.4.2 on 1 node of H200, & guess what - SN40L completely smashes H200's Pareto frontier:
☑️ 5.7x faster (201 tps vs 35 tps)
☑️ Reasoning model (30s vs 171s to generate 6k tokens)
blackbox beast mode powered by @SambaNovaAI is the fastest reasoning model on earth!
transform the way you learn, work and ship products and get started on blackbox!
DeepSeek R1 671B just broke speed records
at 198 t/s it is now the fastest reasoning model available
you can try it in coding mode on anychat soon
prompt: make two fancy D3.js animated looping speedometers, first one is title deepseek R1 671B and second one is OpenAI o3-mini, deep seek should go up to 198 tokens/second and openai o3-mini should go up to 174 tokens/second
Try it out here for blazing fast @OpenAI's O1 style test time scaling powered by @SambaNovaAI cloud API on your favorite opensourced models from @AIatMeta .
Many thanks to @_akhaliq, who put tremendous works in making the gradio app working! 🤗
https://t.co/29LN5onzic
We've loved the conversations around our newly launched SambaNova Cloud. ☁️
In his article, @capacitymedia's @benwodecki gives his insight:
“The SambaNova Cloud is similar to services from rivals... however, the hardware is optimized to a point where it can run on a single rack consisting of just eight trays containing SN40Ls – reducing the infrastructure footprint required to run it.”
He also shares some thoughts from prominent leaders in the industry:
"The service appears to have caught the eye of one Andrew Ng, a machine learning pioneer who co-founded Google Brain, who described SambaNova Cloud as an 'impressive technical achievement.'"
Read more ⤵️
https://t.co/nG6TziAAdz
#AI #API @AndrewNg
The team at @SambaNovaAI are GETTING AFTER IT!
Some people believe in weekends, Blackbox AI and Sambanova don't!
We met over the past 2 weekends to test at scale and finalize our partnership.
Results are LEGIT!
more to come…
I've been playing with @SambaNovaAI's API serving fast Llama 3.1 405B tokens. Really cool to see leading model running at speed. Congrats to Samba Nova for hitting a 114 tokens/sec speed record (and also thanks @KunleOlukotun for getting me an API key!) https://t.co/GuBfYsfizJ
SambaNova is serving Llama 3.1 405B at 114 output tokens/s with their custom chips! This is the fastest we have benchmarked and 4X faster than the median of providers on Artificial Analysis.
Larger models with higher quality come at the cost of speed. New AI-focused custom silicon technologies like @SambaNovaAI's custom RDU chips, and others like Groq, are running bigger models faster - allowing developers to use bigger models at similar speeds to what they were getting on smaller models on GPUs. SambaNova has also declared the model is being stored and served at FP16, indicating quality has not been compromised in achieving these speeds.
While SambaNova is not listed on our public leaderboards due to not having an openly accessible endpoint with per token pricing, we understand they are expanding access and per token pricing is coming soon.
See below for a link to their chat interface where you can try it out for yourself, and also a link to where API access can be requested 👇
📣 Hey, #Developers! In 24 hours, we have the new Llama 3.1 405B running on our SN40L RDUs. Armed with higher memory capacity on our state-of-the-art architecture, we can run the model with:
🎯 The highest precision
🖥️ Fewer chips
💡 Less energy
Sign up to our API Program now and enjoy early access: https://t.co/uZzArIlsDb
📣 Hey, #Developers! In 24 hours, we have the new Llama 3.1 405B running on our SN40L RDUs. Armed with higher memory capacity on our state-of-the-art architecture, we can run the model with:
🎯 The highest precision
🖥️ Fewer chips
💡 Less energy
Sign up to our API Program now and enjoy early access: https://t.co/uZzArIlsDb