Dawei Huang @dawei_huang - Twitter Profile

Pinned Tweet

almost 2 years ago

Can’t be more proud of our team on this amazing achievement in such short time - BF16 llama3.1 405B on 16 chips running at world record 114tokens/s! @SambaNovaAI Team - Keep pushing and let the data flow! #GenAI #Llama #LLM #Llama3 #performance #ai #MondayMotivaton

SambaNova

@SambaNovaAI

almost 2 years ago

🚀 World record performance: SambaNova is running Llama 3.1 405B at 114 t/s with full precision accuracy, in only one rack. Verified by @ArtificialAnlys! 🦙 This speed unlocks so many use cases for enterprises and developers that we cannot wait to see them built on our platform. Apply for early access today: https://t.co/CSlbJbTFVj

5

191

42

34

3M

0

7

2

0

475

Dawei_Huang retweeted

SambaNova

@SambaNovaAI

12 days ago

At @LipBuTan1's #COMPUTEX2026 keynote today, @RodrigoLiang stepped onstage with @RFS_Vista to power up the world's first disaggregated inference cloud, VectorCore Compute (VC2), launched by @Vista_Equity and Cambium Capital. Three chips ran disaggregated inference, live from the VC2 datacenter in LA: ➡️ NVIDIA B200 GPUs — prefill, high-compute burst ➡️ SambaNova RDUs — decode, high-throughput, low-latency token generation at scale ➡️ Intel® Xeon® 6 CPUs — tool execution, end-to-end orchestration “GPUs are powerful. RDUs are fast. CPUs orchestrate. But disaggregate all three — and you get speed, performance, and economics no single chip can touch. That's the unlock.” – @RodrigoLiang

SambaNovaAI's tweet photo. At @LipBuTan1's #COMPUTEX2026 keynote today, @RodrigoLiang stepped onstage with @RFS_Vista to power up the world's first disaggregated inference cloud, VectorCore Compute (VC2), launched by @Vista_Equity and Cambium Capital.

Three chips ran disaggregated inference, live from the VC2 datacenter in LA:
➡️ NVIDIA B200 GPUs — prefill, high-compute burst
➡️ SambaNova RDUs — decode, high-throughput, low-latency token generation at scale
➡️ Intel® Xeon® 6 CPUs — tool execution, end-to-end orchestration

“GPUs are powerful. RDUs are fast. CPUs orchestrate. But disaggregate all three — and you get speed, performance, and economics no single chip can touch. That's the unlock.” – @RodrigoLiang

0

41

10

2

2K

Dawei_Huang retweeted

Hume AI

@hume_ai

11 months ago

We’re excited to be partnering with @SambaNovaAI to deliver the fastest high quality TTS and Conversational AI. Try it out at https://t.co/NmkBkPTPyJ

3

76

18

14

7K

Dawei_Huang retweeted

SambaNova

@SambaNovaAI

about 1 year ago

You know we always come through with the speed 🚀 @AIatMeta's #Llama4 Maverick is now available on SambaNova Cloud with the FASTEST inference! ✈️ 655 t/s — independently verified by @ArtificialAnlys

0

25

7

0

3K

Who to follow

David Huang

@dhuang26

https://t.co/kck0Fk0osC || General studies @Youtube || @davidhuang.blog on 🦋

John Yang

@johnyang100

Co-founder @ Reticular (YC F24). CS & Math @ MIT

Cynthia Chen

@chenxcynthia

@DecagonAI, Prev @Harvard

Dawei_Huang retweeted

Artificial Analysis

@ArtificialAnlys

about 1 year ago

Comparing DeepSeek V3-0324 APIs: We are now tracking 10 APIs for DeepSeek’s new model, including DeepSeek’s first-party API and offerings from Fireworks, DeepInfra, Hyperbolic, Nebius, CentML, https://t.co/VHqKQuuZfq, Novita, Replicate and SambaNova DeepSeek V3-0324 is open weights and there is now a healthy ecosystem of providers offering APIs! Congrats to all the inference providers for making it rapidly available. Given V3-0324’s status as the leading non-reasoning model, we will not be surprised to see more providers make this model available in the coming days. Key info from our benchmarking: ➤ We’re seeing the fastest output speeds on SambaNova (267 tokens/s), Fireworks (82 tokens/s) and CentML (76 tokens/s) ➤ We’re seeing the best prices on DeepSeek’s own API ($0.27/$1.1 per million input/output tokens), followed closely by DeepInfra ($0.4/0.89) ➤ Not all providers are supporting V3-0324’s full 128K context window: notably, DeepSeek’s own API only supports a 64K token context window (as does Novita’s API) and SambaNova currently only supports an 8K token context window DeepSeek V3 and R1 are a particularly hard models to host compared to most other open weights models because they are so large - at 671B total parameters, they cannot fit on a single 8xH100 node. It can be very expensive for providers to offer APIs at small scale.

ArtificialAnlys's tweet photo. Comparing DeepSeek V3-0324 APIs: We are now tracking 10 APIs for DeepSeek’s new model, including DeepSeek’s first-party API and offerings from Fireworks, DeepInfra, Hyperbolic, Nebius, CentML, https://t.co/VHqKQuuZfq, Novita, Replicate and SambaNova

DeepSeek V3-0324 is open weights and there is now a healthy ecosystem of providers offering APIs! Congrats to all the inference providers for making it rapidly available. Given V3-0324’s status as the leading non-reasoning model, we will not be surprised to see more providers make this model available in the coming days.

Key info from our benchmarking:
➤ We’re seeing the fastest output speeds on SambaNova (267 tokens/s), Fireworks (82 tokens/s) and CentML (76 tokens/s)
➤ We’re seeing the best prices on DeepSeek’s own API ($0.27/$1.1 per million input/output tokens), followed closely by DeepInfra ($0.4/0.89)
➤ Not all providers are supporting V3-0324’s full 128K context window: notably, DeepSeek’s own API only supports a 64K token context window (as does Novita’s API) and SambaNova currently only supports an 8K token context window

DeepSeek V3 and R1 are a particularly hard models to host compared to most other open weights models because they are so large - at 671B total parameters, they cannot fit on a single 8xH100 node. It can be very expensive for providers to offer APIs at small scale.

8

195

26

53

18K

Dawei Huang

@Dawei_Huang

about 1 year ago

@XianliangWu @SambaNovaAI @deepseek_ai We will investigate - would you please provide some example prompts with the issue?

2

1

0

53

Dawei_Huang retweeted

SambaNova

@SambaNovaAI

over 1 year ago

SN40L crushes H200 in real-world #AI inference! 🦾 We measured @deepseek_ai's-R1 with SGLang 0.4.2 on 1 node of H200, & guess what - SN40L completely smashes H200's Pareto frontier: ☑️ 5.7x faster (201 tps vs 35 tps) ☑️ Reasoning model (30s vs 171s to generate 6k tokens)

SambaNovaAI's tweet photo. SN40L crushes H200 in real-world #AI inference! 🦾

We measured @deepseek_ai's-R1 with SGLang 0.4.2 on 1 node of H200, & guess what - SN40L completely smashes H200's Pareto frontier:

☑️ 5.7x faster (201 tps vs 35 tps)
☑️ Reasoning model (30s vs 171s to generate 6k tokens) https://t.co/OPxgCJYzeO

3

68

17

7

5K

Dawei_Huang retweeted

Robert Rizk

@RobRizk1

over 1 year ago

blackbox beast mode powered by @SambaNovaAI is the fastest reasoning model on earth! transform the way you learn, work and ship products and get started on blackbox!

0

12

2

1

226

Dawei_Huang retweeted

AK

@_akhaliq

over 1 year ago

DeepSeek R1 671B just broke speed records at 198 t/s it is now the fastest reasoning model available you can try it in coding mode on anychat soon prompt: make two fancy D3.js animated looping speedometers, first one is title deepseek R1 671B and second one is OpenAI o3-mini, deep seek should go up to 198 tokens/second and openai o3-mini should go up to 174 tokens/second

20

665

102

241

136K

Dawei_Huang retweeted

Dawn Song

@dawnsongtweets

over 1 year ago

🎉 Thrilled by the incredible enthusiasm for our LLM Agents MOOC—12K+ registered learners & 5K+ Discord members! 📣 Excited to launch today the LLM Agents MOOC Hackathon, open to all, with $200K+ in prizes & credits! 🔗 Sign up now: https://t.co/vDVJ0AF28a & join us virtually or in person @UCBerkeley! Huge thanks to our sponsors:@OpenAI @GoogleAI @AMD @LambdaAPI @Intel @SierraPlatform @OrbyAI (and more to come) 🚀 Explore 5 exciting tracks: 1️⃣ Applications: Build cutting-edge LLM agents! 2️⃣ Benchmarks: Create innovative AI agent evaluation benchmarks! 3️⃣ Fundamentals: Strengthen core agent capabilities! 4️⃣ Safety: Address critical safety challenges in AI! 5️⃣ Decentralized & Multi-Agents: Push the boundaries of multi-agent systems! Special thanks to my co-instructor @xinyun_chen_ @GoogleDeepMind & our amazing guest speakers for making this a great MOOC: @denny_zhou @GoogleDeepMind; @PercyLiang @Stanford; @8enmann @AnthropicAI; @ShunyuYao12 @OpenAI; @chi_wang_ @GoogleDeepMind; @jerryjliu0 @llama_index; @lateinteraction @Databricks; @gneubig @CarnegieMellon; @NicolasChapados @ServiceNow; @tydsh @AIatMeta; @drjimfan @NVIDIA; Burak Gokturk @Google Join us to shape the future of LLM Agents! https://t.co/hbU2jzkRKQ 🤖✨ #AI #Hackathon #LLMAgents #UCberkeley

17

387

106

140

79K

Dawei_Huang retweeted

Kaizhao Liang

@KyleLiang5

over 1 year ago

Try it out here for blazing fast @OpenAI's O1 style test time scaling powered by @SambaNovaAI cloud API on your favorite opensourced models from @AIatMeta . Many thanks to @_akhaliq, who put tremendous works in making the gradio app working! 🤗 https://t.co/29LN5onzic

5

141

41

68

43K

Dawei_Huang retweeted

SambaNova

@SambaNovaAI

almost 2 years ago

We've loved the conversations around our newly launched SambaNova Cloud. ☁️ In his article, @capacitymedia's @benwodecki gives his insight: “The SambaNova Cloud is similar to services from rivals... however, the hardware is optimized to a point where it can run on a single rack consisting of just eight trays containing SN40Ls – reducing the infrastructure footprint required to run it.” He also shares some thoughts from prominent leaders in the industry: "The service appears to have caught the eye of one Andrew Ng, a machine learning pioneer who co-founded Google Brain, who described SambaNova Cloud as an 'impressive technical achievement.'" Read more ⤵️ https://t.co/nG6TziAAdz #AI #API @AndrewNg

SambaNovaAI's tweet photo. We've loved the conversations around our newly launched SambaNova Cloud. ☁️

In his article, @capacitymedia's @benwodecki gives his insight:

“The SambaNova Cloud is similar to services from rivals... however, the hardware is optimized to a point where it can run on a single rack consisting of just eight trays containing SN40Ls – reducing the infrastructure footprint required to run it.”

He also shares some thoughts from prominent leaders in the industry:

"The service appears to have caught the eye of one Andrew Ng, a machine learning pioneer who co-founded Google Brain, who described SambaNova Cloud as an 'impressive technical achievement.'"

Read more ⤵️
https://t.co/nG6TziAAdz
#AI #API @AndrewNg

2

31

7

4

2K

Dawei Huang

@Dawei_Huang

almost 2 years ago

Really enjoy working with Blackbox team. Very proud of what we were able to achieve in such short time!

BLACKBOX AI

@blackboxai

almost 2 years ago

The team at @SambaNovaAI are GETTING AFTER IT! Some people believe in weekends, Blackbox AI and Sambanova don't! We met over the past 2 weekends to test at scale and finalize our partnership. Results are LEGIT! more to come…

2

14

7

0

1K

0

14

3

0

273

Dawei_Huang retweeted

Andrew Ng

@AndrewYNg

almost 2 years ago

I've been playing with @SambaNovaAI's API serving fast Llama 3.1 405B tokens. Really cool to see leading model running at speed. Congrats to Samba Nova for hitting a 114 tokens/sec speed record (and also thanks @KunleOlukotun for getting me an API key!) https://t.co/GuBfYsfizJ

20

342

64

70

52K

Dawei_Huang retweeted

Artificial Analysis

@ArtificialAnlys

almost 2 years ago

SambaNova is serving Llama 3.1 405B at 114 output tokens/s with their custom chips! This is the fastest we have benchmarked and 4X faster than the median of providers on Artificial Analysis. Larger models with higher quality come at the cost of speed. New AI-focused custom silicon technologies like @SambaNovaAI's custom RDU chips, and others like Groq, are running bigger models faster - allowing developers to use bigger models at similar speeds to what they were getting on smaller models on GPUs. SambaNova has also declared the model is being stored and served at FP16, indicating quality has not been compromised in achieving these speeds. While SambaNova is not listed on our public leaderboards due to not having an openly accessible endpoint with per token pricing, we understand they are expanding access and per token pricing is coming soon. See below for a link to their chat interface where you can try it out for yourself, and also a link to where API access can be requested 👇

ArtificialAnlys's tweet photo. SambaNova is serving Llama 3.1 405B at 114 output tokens/s with their custom chips! This is the fastest we have benchmarked and 4X faster than the median of providers on Artificial Analysis.

Larger models with higher quality come at the cost of speed. New AI-focused custom silicon technologies like @SambaNovaAI's custom RDU chips, and others like Groq, are running bigger models faster - allowing developers to use bigger models at similar speeds to what they were getting on smaller models on GPUs. SambaNova has also declared the model is being stored and served at FP16, indicating quality has not been compromised in achieving these speeds.

While SambaNova is not listed on our public leaderboards due to not having an openly accessible endpoint with per token pricing, we understand they are expanding access and per token pricing is coming soon.

See below for a link to their chat interface where you can try it out for yourself, and also a link to where API access can be requested 👇

10

152

26

51

22K

Dawei Huang

@Dawei_Huang

almost 2 years ago

Running llama3.1 405B BF16 on 16chips is incredible - go checkout at https://t.co/xlNCvagRL9 and surprise yourself! #genAI #llm #Llama

SambaNova

@SambaNovaAI

almost 2 years ago

📣 Hey, #Developers! In 24 hours, we have the new Llama 3.1 405B running on our SN40L RDUs. Armed with higher memory capacity on our state-of-the-art architecture, we can run the model with: 🎯 The highest precision 🖥️ Fewer chips 💡 Less energy Sign up to our API Program now and enjoy early access: https://t.co/uZzArIlsDb

9

64

32

10

13K

1

16

4

2

1K

Dawei_Huang retweeted

SambaNova

@SambaNovaAI

almost 2 years ago

📣 Hey, #Developers! In 24 hours, we have the new Llama 3.1 405B running on our SN40L RDUs. Armed with higher memory capacity on our state-of-the-art architecture, we can run the model with: 🎯 The highest precision 🖥️ Fewer chips 💡 Less energy Sign up to our API Program now and enjoy early access: https://t.co/uZzArIlsDb

9

64

32

10

13K

Dawei Huang

@Dawei_Huang

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users