Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CA

Joined November 2022

375 Following

56.5K Followers

2.8K Posts

Pinned Tweet

Together AI @togethercompute

1 day ago

Closed-source models aren't worth the premium. We generated 12 landing pages with Kimi K2.7 Code and Claude Fable 5. Kimi came in 16x cheaper with comparable quality, especially once we gave it visual context through a design MCP server. Open-source models are already a practical choice for this kind of workflow.

1 day ago

https://t.co/GQk5s4qdDt

13

342

19

500

104K

5

109

12

49

17K

Together AI @togethercompute

about 12 hours ago

This is what open-model tokenomics look like in production. When teams are running billions of tokens, small differences in caching, throughput, and serving efficiency become product-level economics. MiniMax M3 on Together AI is a strong example: frontier-adjacent quality, open-model economics, and a serving stack built for scale.

1 day ago

M3 by @MiniMax_AI is the best value in AI. At @HedyAI_ we run close to a billion tokens through the model each day, and @togethercompute's input caching brings our cost down to $0.128/million input tokens. For a model that is close to the frontier in intelligence and the second most powerful open source model. Unbelievable 🤯

JulianPscheid's tweet photo. M3 by @MiniMax_AI is the best value in AI.

At @HedyAI_ we run close to a billion tokens through the model each day, and @togethercompute's input caching brings our cost down to $0.128/million input tokens.

For a model that is close to the frontier in intelligence and the second most powerful open source model.

Unbelievable 🤯

0

4

0

6

6K

7

30

1

7

4K

Together AI @togethercompute

about 14 hours ago

🤝

1 day ago

let's go open models! ❤️

23

1K

53

220

65K

0

7

0

1

2K

Together AI @togethercompute

about 16 hours ago

Link to attend: https://t.co/zSpXibADtM

0

2

0

1

904

Who to follow

Verified account

The AI community building the future. https://t.co/TpiXQMQ9rZ

Verified account

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

Databricks AI Research

Verified account

We remove the barriers to state-of-the-art generative AI model development and make data + AI available to all.

Together AI @togethercompute

about 16 hours ago

Open models are what make collective agent intelligence possible. James Zou from Together AI and Venkat Srinivasan from NVIDIA are joining us July 1 at AI Engineer World's Fair to dig into exactly that.

togethercompute's tweet photo. Open models are what make collective agent intelligence possible.

James Zou from Together AI and Venkat Srinivasan from NVIDIA are joining us July 1 at AI Engineer World's Fair to dig into exactly that. https://t.co/FV0bnSVgwg

1

16

0

2

1K

togethercompute retweeted

1 day ago

GLM 5.2 is available now on @togethercompute! Very fast speeds (200+ tps), try it out & let me know what you think! Video is not sped up. https://t.co/pggGhspSdj

3

81

3

56

19K

Together AI @togethercompute

1 day ago

@Zai_org GLM-5.2 is live on Together AI. Try it now: https://t.co/Ex49cKrpQR

0

4

0

0

1K

Together AI @togethercompute

1 day ago

Introducing GLM-5.2 from @Zai_org, https://t.co/2XG0WEPpHd’s latest flagship open model for long-horizon tasks with 1M context, flexible thinking effort, and stronger agentic coding. Now available on Together AI, GLM-5.2 runs on research-powered inference for long-context, tool-heavy agent workloads.

togethercompute's tweet photo. Introducing GLM-5.2 from @Zai_org, https://t.co/2XG0WEPpHd’s latest flagship open model for long-horizon tasks with 1M context, flexible thinking effort, and stronger agentic coding.

Now available on Together AI, GLM-5.2 runs on research-powered inference for long-context, tool-heavy agent workloads.

8

71

7

6

9K

Together AI @togethercompute

1 day ago

@Zai_org Highlights: 👉 1M context built to sustain long-horizon work 👉 Stronger coding with flexible effort levels to balance latency and depth 👉 Improved architecture with IndexShare, reducing per-token FLOPs 2.9x at 1M context 👉 MIT-licensed open weights for broad technical access

1

2

0

0

1K

Together AI @togethercompute

1 day ago

@_AustinO1 Soon!

0

0

0

0

37

Together AI @togethercompute

2 days ago

We tested closed and open models by asking them to build small, playable games. Open models were much cheaper and faster, while producing games that were often close in quality. → Opus 4.8 was 15x more expensive than MiniMax M3 → GPT-5.5 was 10x more expensive than Nemotron Ultra → Kimi K2.7 Code was 7x cheaper than Opus 4.8 For more workloads, the closed-to-open shift is becoming hard to ignore: strong quality, better tokenomics, and faster inference.

6

49

10

11

8K

Together AI @togethercompute

1 day ago

@OrganicGPT What about Kimi K2.7 Code vs. Fable 5 🫣 https://t.co/NCmhPlImud

1 day ago

https://t.co/GQk5s4qdDt

13

342

19

500

104K

0

3

0

0

633

togethercompute retweeted

Vipul Ved Prakash

1 day ago

Just added gobs of H100s, H200s and B200s on our on-demand compute platform. https://t.co/QkB0qMIhSu

2

32

6

3

2K

togethercompute retweeted

Decagon @DecagonAI

2 days ago

🤝

0

16

2

5

3K

togethercompute retweeted

3 days ago

Built a visual benchmark where I asked closed and open source models to build small games. Main takeaway: OSS models were a lot faster, cheaper, & produced games with similar quality. Specifically: * Opus 4.8 was 15x more expensive than MiniMax M3 * GPT-5.5 was 10x more expensive than Nemotron Ultra * Kimi K2.7 Code was 7x cheaper than Opus 4.8 You can even play the generated games yourself, the quality gap is surprisingly small (even non existent for some games). Open source models are getting hard to ignore! Of course, this doesn't extend to all tasks. There are definitely certain hard tasks where you'd benefit from using an Opus 4.8 level model. But increasingly, you're able to do more and more tasks with cheaper and faster open source models which is a trend I'm seeing with our customers too.

11

66

7

29

10K

Together AI @togethercompute

3 days ago

.@DecagonAI cut voice agent cost per turn nearly 6x with Together AI. They moved from closed models to fine-tuned open models, while keeping latency low enough for real-time voice: → <400ms p95 model latency per turn → custom speculators and prompt caching → optimized serving on NVIDIA Blackwell → weekly, sometimes daily model deployment velocity This is the closed-to-open shift: more control, better tokenomics, and production performance without being locked into proprietary APIs.

Together AI @togethercompute

3 days ago

https://t.co/LcNvOz8yV7

0

13

1

5

8K

6

33

6

13

5K

Together AI @togethercompute

3 days ago

https://t.co/LcNvOz8yV7

0

13

1

5

8K

Together AI @togethercompute

3 days ago

It’s a great question! End user TPS does come at the cost of concurrency. It’s essentially a dial you can optimize for speed, concurrency, and cost. That being said, we’ve also built in a lot of optimizations into our inference stack so we can provide a higher per user TPS at equivalent concurrency, or offer a higher concurrency at equivalent end user TPS (two sides of the same coin) vs. leading OSS inference engines. Check out a recent deep dive on our work optimizing coding agents for production: https://t.co/aFTWY0IWEE

0

3

0

2

287

Together AI @togethercompute

3 days ago

Optimizing GLM 5.1 came down to three things: -> Rewrote the indexer topk kernel -> Fused the indexer kernel to reduce memory and launch overhead -> Eliminated CPU overhead that was gating prefill throughput The bigger win was in the indexer. Once we fixed that, the rest made it even faster. GLM 5.1 is available on Together AI.

togethercompute's tweet photo. Optimizing GLM 5.1 came down to three things:

-> Rewrote the indexer topk kernel

-> Fused the indexer kernel to reduce memory and launch overhead

-> Eliminated CPU overhead that was gating prefill throughput

The bigger win was in the indexer. Once we fixed that, the rest made it even faster.

GLM 5.1 is available on Together AI.

14

231

16

72

15K

Together AI @togethercompute

3 days ago

@Listen987 Soon!

0

2

0

0

297

Together AI @togethercompute

3 days ago

@latentlocal Stay tuned - see you at the top of that leaderboard too

1

7

0

0

376

Last Seen Users on Sotwe

Trends for you

Most Popular Users