Phil Howes

7 months ago

so much potential in this model and @aqaderb coming out of the gates just ripping the landscape on perf

7 months ago

It’s Monday, and we could all use a little help thinking. Thankfully we have the new Kimi K2 Thinking to do it for us. Kimi K2 Thinking is now live in our Model APIs with the most performant TTFT (0.3 sec) and TPS (140) on @openrouter & @ArtificialAnlys . If you’re looking for an alternative to GPT-5, utilize coding or are building agentic AI, you *need* to give this model a try. Congrats @Kimi_Moonshot , you all are astounding. Get access in the comments ➡️

baseten's tweet photo. It’s Monday, and we could all use a little help thinking. Thankfully we have the new Kimi K2 Thinking to do it for us.

Kimi K2 Thinking is now live in our Model APIs with the most performant TTFT (0.3 sec) and TPS (140) on @openrouter & @ArtificialAnlys . If you’re looking for an alternative to GPT-5, utilize coding or are building agentic AI, you *need* to give this model a try.

Congrats @Kimi_Moonshot , you all are astounding.

Get access in the comments ➡️

7

97

11

26

61K

0

3

1

0

230

8 months ago

speculation, in this case a eagle-3, remains one of the biggest levers to go from good to great. amazing job to leapfrog the market and get the most out of our GPUs

8 months ago

This week, Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on @nvidia hardware. When gpt-oss launched we sprinted to offer it at 450 TPS... now we've exceeded 650 TPS and 0.11 sec TTFT... and we'll keep working to keep raising the bar. We are proud to offer the best E2E latency available with near-limitless scale, incredible performance, and the highest uptime 99.99%.

baseten's tweet photo. This week, Baseten's model performance team unlocked the fastest TPS and TTFT for gpt-oss 120b on @nvidia hardware. When gpt-oss launched we sprinted to offer it at 450 TPS... now we've exceeded 650 TPS and 0.11 sec TTFT... and we'll keep working to keep raising the bar.

We are proud to offer the best E2E latency available with near-limitless scale, incredible performance, and the highest uptime 99.99%.

14

99

15

26

44K

0

1

0

127

a community of founders, devs & grantees working in the @solana ecosystem. || no token, just an online community.

10 months ago

💪🫡 still plenty of juice to squeeze out of this one

Amir Haghighat

@amiruci

10 months ago

It's important to support newly released open-weight models on day 1. But it's not noteworthy. What's noteworthy is to have the inference optimization muscle to immediately blow the competition out of water on latency and throughput. As measured by OpenRouter:

amiruci's tweet photo. It's important to support newly released open-weight models on day 1. But it's not noteworthy. What's noteworthy is to have the inference optimization muscle to immediately blow the competition out of water on latency and throughput.

As measured by OpenRouter: https://t.co/cxt9rFNMCn

13

87

14

10

29K

1

10

0

1

637

Who to follow

Superteam

@superteam

Pankaj Gupta

@defpan

Co-founder @basetenco working on ML model performance

Learn (AGI Bot)

@learn

AGI that will answer all of your questions. Just tag @Learn and it'll teach you about anything. (Parody account)

12 months ago

you can just do things faster

12 months ago

We're excited to introduce the Baseten Performance Client, a new open-source Python library for up to 12x higher throughput for high-volume embedding tasks! Stand up a new vector database, preprocess text, and run massive workloads in <2 minutes (vs. 15+ with AsyncOpenAI).

baseten's tweet photo. We're excited to introduce the Baseten Performance Client, a new open-source Python library for up to 12x higher throughput for high-volume embedding tasks!

Stand up a new vector database, preprocess text, and run massive workloads in <2 minutes (vs. 15+ with AsyncOpenAI). https://t.co/Q7on09jirV

4

23

6

4

3K

1

11

0

239

over 1 year ago

@jxmnop if you read this and still want to learn cuda anyway, we’re hiring for this at @baseten to get more brrrr/dollar. dms open

0

8

0

4

409

saltyph retweeted

Michael Feil

@feilsystem

over 1 year ago

New Qwen-QWQ running at 90tokens/s generation speed on a single H100 @baseten using a new spec-dec stack. Around 2x more than the rest of the leaderboard (https://t.co/StCzjaZ1i0).

1

29

11

2

2K

over 1 year ago

hit new peak demand today, 3 million RPS. thanks for stress testing our infra anon internet friend

0

2

0

93

over 1 year ago

@tuhinone what were your <think> tokens?

0

5

saltyph retweeted

about 2 years ago

The models are available at the following links: Llama 3 8B Instruct: https://t.co/y9D0VhSjxW Llama 3 70B Instruct: https://t.co/1MRAHEdsgc

0

6

3

0

2K

saltyph retweeted

Conviction @conviction

about 2 years ago

Congrats to Conviction and Embed companies @baseten @Figure_robot @harvey__ai @langchain @MistralAI @sierraplatform @pika_labs (and our many pioneering friends) for making the #ForbesAI50 list! Ground floor of the revolution that will lead to many massive companies.

conviction's tweet photo. Congrats to Conviction and Embed companies @baseten @Figure_robot
@harvey__ai @langchain @MistralAI
@sierraplatform @pika_labs (and our many pioneering friends) for making the #ForbesAI50 list!

Ground floor of the revolution that will lead to many massive companies. https://t.co/hfoq6ZJ76E

2

24

7

23

21K

saltyph retweeted

abu

@aqaderb

about 2 years ago

2 things. 1. i have loved working on this team. model performance is so much fun and so rewarding. 2. persistence is key. we started working on model performance end of 2023 and watching us slowly become better and better has been an incredible experience.

1

20

3

2

2K

over 2 years ago

@saranormous i strongly recommend @Fall_of_Civ_Pod for you both, great production quality long form history

0

1

0

162

over 2 years ago

when i tell people working in infra is like being a plumber people assume it’s because of lots of pipe connecting, when in fact it’s because i spend most of my day digging through shit

0

8

0

145

over 2 years ago

@thomasschiavone @saranormous @awscloud @baseten gotta get gpus somewhere. welcome aboard, happy to bear the brunt of the pain

0

2

1

0

325

over 2 years ago

@aqaderb couldn't do it without you friend

0

3

0

78

over 2 years ago

every day i get to work with a world class team supporting customers with world class products. today we get to dream a little bigger

over 2 years ago

We're excited to announce that we've raised a $40M Series B to help power the next generation of AI-native products with performant, reliable and scalable inference infrastructure. https://t.co/NAn8LduZ6I

12

112

18

20

83K

0

11

0

272

saltyph retweeted

over 2 years ago

Ready to try open source LLMs? Switch from GPT to Mistral 7B in the smallest refactor you'll ever ship: just 3 tiny code changes. If you're making the jump, DM us for $1,000 in free credits. https://t.co/izLK8UUJBZ

0

15

7

1

2K