prem prakash @premp2006 - Twitter Profile

premp2006 retweeted

about 11 hours ago

When deploying voice agents, users notice latency above 500ms. Above one second, they hang up. @rish_bhargava walks through the full pipeline at that level of specificity, including why 75ms of network latency adds 30% overhead and how colocating everything drops it to 5ms. https://t.co/s6yqpFpsaA

1

11

3

2

2K

premp2006 retweeted

Together AI @togethercompute

about 8 hours ago

MiniMax-M3 from @MiniMax_AI is now available on Together AI. It’s an open-weight native multimodal model with 1M context, MiniMax Sparse Attention, and thinking / non-thinking modes. Together AI is MiniMax’s preferred cloud partner, with inference optimizations delivering up to 125% higher throughput across concurrency levels.

togethercompute's tweet photo. MiniMax-M3 from @MiniMax_AI is now available on Together AI.

It’s an open-weight native multimodal model with 1M context, MiniMax Sparse Attention, and thinking / non-thinking modes.

Together AI is MiniMax’s preferred cloud partner, with inference optimizations delivering up to 125% higher throughput across concurrency levels.

1

16

2

3

7K

premp2006 retweeted

Together AI @togethercompute

1 day ago

Frontier model performance on an open model, post-trained in under 24 hours. @trajectorylabs is showing what's possible when great open models meet the right training infrastructure. Proud to power the compute behind this work alongside @nvidia .

4

19

5

6K

premp2006 retweeted

Hassan

@nutlope

3 days ago

I asked 8 AI models (including Fable 5) for their world cup predictions. Going to keep an updated leaderboard based on match results to see which AI model performed the best! Launching tomorrow, right before the world cup kicks off!

10

47

4

28

9K

Who to follow

snap:king.45 go subscribe to my YouTube channel https://t.co/tzm7QjnubK go tune in

premp2006 retweeted

3 days ago

As vertically integrated platforms start to dominate they lock out third party access to the most valuable portions of the platform. Of course, Anthropic is has the right to implement whatever policy they want. But this is why open-weights are critical for human progress.

3

43

6

3

4K

premp2006 retweeted

Together AI @togethercompute

4 days ago

The best AI infrastructure shouldn't be reserved for the biggest companies. Together AI is partnering with @pax8 to bring powerful, cost-efficient AI and leading open-source models to small and mid-sized businesses worldwide. https://t.co/fxib9K20DE

5

12

3

0

1K

premp2006 retweeted

Vipul Ved Prakash

@vipulved

4 days ago

PSA: Just added a few thousand chips, including B200s and B300s to our Dedicated Model Inference (https://t.co/PiJhHYHVZh). With Dedicated Model Inference, you can now on-click deploy our Blackwell optimized inference engine with auto-scale on frontier OSS models including Nemotron, Minimax, Kimi, DeepSeek, GLM and Qwen.

7

83

12

8K

premp2006 retweeted

Together AI @togethercompute

11 days ago

MiniMax-M3 combines 1M context, native multimodality, and MiniMax Sparse Attention. The next layer is serving it efficiently: KV-block-major sparse attention, paged MSA decode, optimized index scoring, and multimodal preprocessing before the GPU worker. Together’s Inference and Kernel teams improved throughput by 81–125% across common agentic-shape traffic. We go deeper in this deep dive from @ywangfirstlean, @zhyncs42, @realDanFu and the team.

1

30

10

12

10K

premp2006 retweeted

Together AI @togethercompute

11 days ago

Going live now with @MiniMax_AI 🚀 https://t.co/wPayfOWmNg

2

9

1

2K

premp2006 retweeted

Together AI @togethercompute

12 days ago

MiniMax M3 is live and Together AI is powering its inference 🚀 Tomorrow at 6pm PT we're going live on X Spaces with the teams behind the model and the infrastructure to give you a deep dive. https://t.co/wPayfOWmNg

7

58

8

11

399K

premp2006 retweeted

LangChain

@LangChain

15 days ago

The latest finding in the LangSmith Signal: Open Models are having a moment. 1 in 3 AI teams ran an open-weights model in April 2026, up from 1 in 5 nine months ago. The overall number of teams using open weights grew 3x. We’re seeing newer users choose open models at a higher rate than those who came before.

LangChain's tweet photo. The latest finding in the LangSmith Signal: Open Models are having a moment.

1 in 3 AI teams ran an open-weights model in April 2026, up from 1 in 5 nine months ago.

The overall number of teams using open weights grew 3x.

We’re seeing newer users choose open models at a higher rate than those who came before.

17

126

34

42

68K

premp2006 retweeted

Together AI @togethercompute

14 days ago

We took the Hot Wings Challenge to NVIDIA GTC 🌶️ @realDanFu (VP of Kernels) and @sarung (VP of Customer Success) answered some questions around AI, one spicy wing at a time. Some people sweat. Some people talk. Watch to see who did both.

1

17

3

0

6K

premp2006 retweeted

Together AI @togethercompute

15 days ago

Together AI serves the two fastest STT models measured by @ArtificialAnlys NVIDIA Parakeet-TDT 0.6B v3 can transcribe 20 hours of speech in under 10 seconds. This deep dive shows the systems work behind the leaderboard: TensorRT profiles, conditional CUDA graphs, evented I/O, shared memory, and Python GC control.

3

41

8

21

7K

premp2006 retweeted

Together AI @togethercompute

17 days ago

The best research labs are building what comes after static models. Congrats to @trajectorylabs on the launch! Excited to have them training on the AI Native Cloud as they push the frontier on Continual Learning!

6

26

2

7

4K

premp2006 retweeted

Ce Zhang

@ce_zhang

15 days ago

🔜

ce_zhang's tweet photo. 🔜 https://t.co/Gw0Y7rNZcR

20

275

13

29

31K

premp2006 retweeted

Vipul Ved Prakash

@vipulved

20 days ago

Our inference stack, optimized for Blackwells, with a novel attention kernel and many new optimizations has started rolling out! It's already charting on Artificial Analysis, eg: #1 speed and latency for @Kimi_Moonshot Kimi 2.6. #1 on latency on @MiniMax_AI, and miles ahead of other GPU endpoints. https://t.co/Yx6rIcZPyk https://t.co/AdORQ3GLu9

vipulved's tweet photo. Our inference stack, optimized for Blackwells, with a novel attention kernel and many new optimizations has started rolling out!

It's already charting on Artificial Analysis, eg: #1 speed and latency for @Kimi_Moonshot Kimi 2.6. #1 on latency on @MiniMax_AI, and miles ahead of other GPU endpoints.

https://t.co/Yx6rIcZPyk
https://t.co/AdORQ3GLu9

8

146

15

45

14K

premp2006 retweeted

Vipul Ved Prakash

@vipulved

22 days ago

PSA: Just added a thousand H100s and H200s to Together on-demand GPU clusters and Dedicated Endpoints: https://t.co/fr7yzZpPP8

3

87

13

14

13K

premp2006 retweeted

Together AI @togethercompute

22 days ago

Introducing Qwen3.7-Max from @Alibaba_Qwen, Qwen’s flagship model for the agent era with 1M context and leading performance across agentic coding, reasoning, and long-horizon autonomy. AI natives can now use Qwen3.7-Max on Together Serverless Inference for production-scale agent workflows.

togethercompute's tweet photo. Introducing Qwen3.7-Max from @Alibaba_Qwen, Qwen’s flagship model for the agent era with 1M context and leading performance across agentic coding, reasoning, and long-horizon autonomy.

AI natives can now use Qwen3.7-Max on Together Serverless Inference for production-scale agent workflows.

8

45

6

3

6K

premp2006 retweeted

Together AI @togethercompute

24 days ago

"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels When you're running dozens of concurrent coding agents — each with 45k–200k token contexts — the benchmarks that matter are the ones that stress KV cache, scheduler limits, and throughput under real load. We ran those benchmarks. Our Inference Engine delivered: → 31% higher TPS than the next fastest OSS engine → 2× better time-to-first-token at saturation → 76% lower cost per request vs. Claude Opus 4.6 Read the full technical breakdown → https://t.co/KPUq4jYAFH

12

46

3

27

13K

premp2006 retweeted

Together AI @togethercompute

25 days ago

Congrats to the @cursor_ai team on Composer 2.5 — a huge milestone for agentic coding models. Together AI, the AI Native Cloud, is proud to partner on this launch. Composer 2.5 is pushing the frontier for coding agents and turning heads for its speed and quality. Excited to keep building with the Cursor team!

3

115

10

11

13K

prem prakash

@premp2006

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users