Clarifai

Verified account

@clarifai

Create and Control Your AI Workloads On Any Compute.

Washington, DC

Joined March 2014

2.1K Following

10.9K Followers

10.4K Posts

clarifai retweeted

23 days ago

Huge news: Clarifai has agreed to license our AI inference & compute orchestration IP — plus the patent portfolio behind it – to @nebiusai. Our core team is joining them to keep building. Our technology becomes a key part of Nebius Token Factory, the inference platform inside their full-stack AI cloud. Faster inference. Bigger scale. Can't wait to ride this rocket ship and build the foundation for the next decade of AI inference. 🚀

mattzeiler's tweet photo. Huge news: Clarifai has agreed to license our AI inference & compute orchestration IP — plus the patent portfolio behind it – to @nebiusai.

Our core team is joining them to keep building.

Our technology becomes a key part of Nebius Token Factory, the inference platform inside their full-stack AI cloud. Faster inference. Bigger scale.

Can't wait to ride this rocket ship and build the foundation for the next decade of AI inference. 🚀

10

163

16

10

36K

about 1 month ago

Learn more about Nemotron 3 Nano Omni here: https://t.co/1TQJbwX64F

0

0

0

0

320

about 1 month ago

NVIDIA Nemotron 3 Nano Omni is now available on Clarifai with Zero Day support. A 30B A3B multimodal reasoning model built for agent workflows across documents, images, video, audio, and text. Why it stands out: • Multimodal input across text, image, video, and audio • Hybrid MoE + Transformer-Mamba architecture • 300K context window • Runs on a single H100, H200, or B200 • 400 tokens/sec on Clarifai Reasoning Engine

clarifai's tweet photo. NVIDIA Nemotron 3 Nano Omni is now available on Clarifai with Zero Day support.

A 30B A3B multimodal reasoning model built for agent workflows across documents, images, video, audio, and text.

Why it stands out:

• Multimodal input across text, image, video, and audio
• Hybrid MoE + Transformer-Mamba architecture
• 300K context window
• Runs on a single H100, H200, or B200
• 400 tokens/sec on Clarifai Reasoning Engine

2

12

4

2

2K

about 1 month ago

Try the new model here: https://t.co/EdyzyLNAVH

1

2

0

0

335

Who to follow

Mustafa Suleyman

Verified account

@mustafasuleyman

CEO, @MicrosoftAI | Author: The Coming Wave | Past: Co-founder, @InflectionAI & @GoogleDeepMind

Salesforce AI Research

Verified account

We advance state-of-the-art #AI techniques paving the path for innovative products at @Salesforce. Focus areas: #AIAgents, #EnterpriseAI, #EGI, and #TrustedAI.

Verified account

Founder and CEO of @clarifai Understand Everything

about 1 month ago

Try it now in our playground: https://t.co/Cc45z9jiXh

0

0

0

0

227

about 1 month ago

New SotA open source model, new pole position for Clarifai Reasoning Engine. Kimi K2.6 cooking at 164 tokens per second.

clarifai's tweet photo. New SotA open source model, new pole position for Clarifai Reasoning Engine.

Kimi K2.6 cooking at 164 tokens per second. https://t.co/yoS97DnhIH

1

1

1

0

381

about 2 months ago

Try the model here: https://t.co/vYbdp9iCeO

0

2

0

1

210

about 2 months ago

Qwen3.6-35B-A3B is now live on Clarifai. 🚀 The model delivers frontier-level agentic coding performance with a MoE architecture that only activates 3B out of 35B total parameters per token. Apache 2.0 licensed and runs locally. Key capabilities: - 73.4% on SWE-Bench Verified for real-world GitHub issue resolution - 51.5% on Terminal-Bench 2.0 (highest among open models) - 37.0% on MCPMark for tool use and function calling - Multimodal support (text, image, video) - 262K native context, extendable to 1M tokens - Thinking Preservation: retains reasoning traces across multi-turn conversations for better agent consistency Access via API or deploy on your own dedicated compute.

clarifai's tweet photo. Qwen3.6-35B-A3B is now live on Clarifai. 🚀

The model delivers frontier-level agentic coding performance with a MoE architecture that only activates 3B out of 35B total parameters per token.

Apache 2.0 licensed and runs locally.

Key capabilities:

- 73.4% on SWE-Bench Verified for real-world GitHub issue resolution
- 51.5% on Terminal-Bench 2.0 (highest among open models)
- 37.0% on MCPMark for tool use and function calling
- Multimodal support (text, image, video)
- 262K native context, extendable to 1M tokens
- Thinking Preservation: retains reasoning traces across multi-turn conversations for better agent consistency

Access via API or deploy on your own dedicated compute.

1

4

0

0

489

about 2 months ago

Clarifai 12.3: Introducing KV Cache-Aware Routing! 🚀 LLM inference with multiple replicas typically uses random load balancing. But replicas aren't interchangeable. Each builds up different KV cache state from the requests it processes. When requests land on replicas without relevant context cached, models recompute everything from scratch. This wastes GPU cycles and increases latency. KV Cache-Aware Routing automatically detects prompt overlap and routes requests to replicas with relevant cache state already loaded. Shared system prompts are computed once and reused. RAG context is cached when multiple users query similar documents. Zero configuration required. Deploy a model with multiple replicas and it works. Also new: Warm Node Pools for faster scaling, Session-Aware Routing to keep user requests on the same replica, Prediction Caching for identical inputs, and Clarifai Skills for AI coding assistants. Read the full release blog: https://t.co/5OqCpwDizQ

clarifai's tweet photo. Clarifai 12.3: Introducing KV Cache-Aware Routing! 🚀

LLM inference with multiple replicas typically uses random load balancing. But replicas aren't interchangeable. Each builds up different KV cache state from the requests it processes.

When requests land on replicas without relevant context cached, models recompute everything from scratch. This wastes GPU cycles and increases latency.

KV Cache-Aware Routing automatically detects prompt overlap and routes requests to replicas with relevant cache state already loaded. Shared system prompts are computed once and reused. RAG context is cached when multiple users query similar documents.

Zero configuration required. Deploy a model with multiple replicas and it works.

Also new: Warm Node Pools for faster scaling, Session-Aware Routing to keep user requests on the same replica, Prediction Caching for identical inputs, and Clarifai Skills for AI coding assistants.

Read the full release blog: https://t.co/5OqCpwDizQ

0

4

1

0

287

about 2 months ago

@EvanKirstel We'd love that Evan - love the show. Want to reach out to [email protected] and we'll set it up?

1

1

0

0

39

about 2 months ago

Do you feel the need for speed? Come learn how Clarifai breaks speed records for inference across multiple models at Booth #405 at #humanx.

NVIDIA AI Infrastructure

about 2 months ago

⚡ In just 10 days, leading inference providers propelled Kimi K2.5 up the @ArtificialAnlys leaderboard by adopting key elements of NVIDIA’s Inference Reference Architecture. Incredible work by @basetenco, @clarifai, @DeepInfra, @Eigen_AI_Labs, @FireworksAI_HQ, @friendliai, @LightningAI, @nebiusai, @togethercompute, @wandb by @CoreWeave. From custom kernel optimizations to NVFP4, disaggregated serving, and speculative decoding, this extreme full-stack co-design of the NVIDIA Blackwell platform is driving major gains in both performance and efficiency. The result? The lowest token cost at scale. Get started with your preferred NVIDIA partner or explore our Inference Reference Architecture documentation ➡️ https://t.co/bGlvFmq9iy

3

87

12

18

11K

1

4

3

0

704

about 2 months ago

Day 2 at HumanX! 🚀 Clarifai Reasoning Engine is running at booth #405. Custom CUDA kernels, speculative decoding optimized for reasoning workloads, and adaptive optimization. That's how we deliver 500+ TPS on Kimi K2.5 and lead on Qwen3.5 and other top reasoning models benchmarked by Artificial Analysis. But software optimization is only half the story. Tomorrow at 11 AM, Alfredo Ramos (our CPTO) is presenting with @Vultr on the infrastructure side: how to design network architecture that scales from edge deployments to hyperscale data centers for AI workloads. Vultr Theatre Session at booth #825.

clarifai's tweet photo. Day 2 at HumanX! 🚀

Clarifai Reasoning Engine is running at booth #405. Custom CUDA kernels, speculative decoding optimized for reasoning workloads, and adaptive optimization.

That's how we deliver 500+ TPS on Kimi K2.5 and lead on Qwen3.5 and other top reasoning models benchmarked by Artificial Analysis.

But software optimization is only half the story. Tomorrow at 11 AM, Alfredo Ramos (our CPTO) is presenting with @Vultr on the infrastructure side: how to design network architecture that scales from edge deployments to hyperscale data centers for AI workloads.

Vultr Theatre Session at booth #825.

0

2

2

0

209

2 months ago

Day 1 at HumanX! 🚀 Clarifai is at booth #405 showing how we're running reasoning models in production. We hit 500+ TPS on Kimi K2.5 - first provider to cross that threshold - and we're currently leading on Qwen3.5 at 290 TPS on Artificial Analysis benchmarks. Getting that performance requires more than optimized inference code. You need network infrastructure built for AI workloads from the ground up. Tomorrow at 11 AM, our VP of Strategy Sajai Krishnan is presenting on exactly that: building AI data centers that can handle edge to hyperscale deployments. @Vultr Theatre Session at booth #825.

clarifai's tweet photo. Day 1 at HumanX! 🚀

Clarifai is at booth #405 showing how we're running reasoning models in production.

We hit 500+ TPS on Kimi K2.5 - first provider to cross that threshold - and we're currently leading on Qwen3.5 at 290 TPS on Artificial Analysis benchmarks.

Getting that performance requires more than optimized inference code. You need network infrastructure built for AI workloads from the ground up.

Tomorrow at 11 AM, our VP of Strategy Sajai Krishnan is presenting on exactly that: building AI data centers that can handle edge to hyperscale deployments.

@Vultr Theatre Session at booth #825.

0

4

1

0

194

2 months ago

Vendor lock-in is the biggest AI infrastructure risk nobody's planning for. The attempted federal ban on Anthropic in late February exposed what happens when you build your entire AI stack on a single vendor. Government agencies and contractors scrambled overnight. The ban was blocked by a federal judge as illegal retaliation, but the lesson stands: you're one policy decision away from a forced migration. Building on a single API means when that vendor becomes unavailable, you're rebuilding from scratch. Claude today. GPT tomorrow. Gemini if you have to. Your infrastructure should handle that swap without breaking. That's how Clarifai is built. Deploy any open-source reasoning model on your own infrastructure - Kimi K2.5, Qwen3.5, GLM, or your own fine-tuned models. We're leading on Kimi K2.5 and Qwen3.5 on Artificial Analysis benchmarks because the infrastructure is purpose-built for reasoning workloads, not locked to a single model. Build your AI stack so you can swap the engine without rebuilding the car.

clarifai's tweet photo. Vendor lock-in is the biggest AI infrastructure risk nobody's planning for.

The attempted federal ban on Anthropic in late February exposed what happens when you build your entire AI stack on a single vendor. Government agencies and contractors scrambled overnight. The ban was blocked by a federal judge as illegal retaliation, but the lesson stands: you're one policy decision away from a forced migration.

Building on a single API means when that vendor becomes unavailable, you're rebuilding from scratch. Claude today. GPT tomorrow. Gemini if you have to. Your infrastructure should handle that swap without breaking.

That's how Clarifai is built. Deploy any open-source reasoning model on your own infrastructure - Kimi K2.5, Qwen3.5, GLM, or your own fine-tuned models.

We're leading on Kimi K2.5 and Qwen3.5 on Artificial Analysis benchmarks because the infrastructure is purpose-built for reasoning workloads, not locked to a single model.

Build your AI stack so you can swap the engine without rebuilding the car.

0

2

2

0

268

2 months ago

Book a meeting with the team here: https://t.co/95cx8l7aJD

0

1

0

0

140

2 months ago

One week until HumanX 2026. 🚀 Clarifai is bringing production-ready AI infrastructure to San Francisco April 6-9. We're now leading on both Kimi K2.5 (500+ TPS) and Qwen3.5-397B (290 TPS), benchmarked by Artificial Analysis. Deploy any AI model, on any compute, at any scale. Our platform handles the complexity - from compute optimization to production inference. See Clarifai Reasoning Engine live at booth #405. Schedule time with the team here.

clarifai's tweet photo. One week until HumanX 2026. 🚀

Clarifai is bringing production-ready AI infrastructure to San Francisco April 6-9.

We're now leading on both Kimi K2.5 (500+ TPS) and Qwen3.5-397B (290 TPS), benchmarked by Artificial Analysis.

Deploy any AI model, on any compute, at any scale. Our platform handles the complexity - from compute optimization to production inference.

See Clarifai Reasoning Engine live at booth #405. Schedule time with the team here.

2

1

0

0

237

2 months ago

Clarifai is heading to HumanX 2026 in San Francisco. 🚀 At GTC this week, we announced 414 tokens per second on Kimi K2.5 - first provider to reach this performance. We're also leading on Qwen3.5-397B with 290 TPS, benchmarked by Artificial Analysis. Now we're bringing that momentum to #HumanX April 6-9. Come see how Clarifai Reasoning Engine delivers production-ready performance for reasoning models and agentic workloads. Find us at booth #405 or schedule time with Matthew Zeiler, Sajai Krishnan, or Douglas Shapiro. Book a meeting here: https://t.co/gaquLGqzq0

clarifai's tweet photo. Clarifai is heading to HumanX 2026 in San Francisco. 🚀

At GTC this week, we announced 414 tokens per second on Kimi K2.5 - first provider to reach this performance. We're also leading on Qwen3.5-397B with 290 TPS, benchmarked by Artificial Analysis.

Now we're bringing that momentum to #HumanX April 6-9.

Come see how Clarifai Reasoning Engine delivers production-ready performance for reasoning models and agentic workloads.

Find us at booth #405 or schedule time with Matthew Zeiler, Sajai Krishnan, or Douglas Shapiro.

Book a meeting here: https://t.co/gaquLGqzq0

0

3

0

0

354

3 months ago

If you're at GTC, meet with us to see the speed first hand. https://t.co/g5EFW1laYx

0

0

0

0

109

3 months ago

Day 3 for GTC is here and there is a new pole sitter for Qwen3.5 performance. Clarifai Reasoning Engine approaches 300 tokens per second - 245% faster than a vanilla install.

clarifai's tweet photo. Day 3 for GTC is here and there is a new pole sitter for Qwen3.5 performance.

Clarifai Reasoning Engine approaches 300 tokens per second - 245% faster than a vanilla install. https://t.co/XS32M6EPty

clarifai's tweet photo. Day 3 for GTC is here and there is a new pole sitter for Qwen3.5 performance.

Clarifai Reasoning Engine approaches 300 tokens per second - 245% faster than a vanilla install. https://t.co/XS32M6EPty

clarifai's tweet photo. Day 3 for GTC is here and there is a new pole sitter for Qwen3.5 performance.

Clarifai Reasoning Engine approaches 300 tokens per second - 245% faster than a vanilla install. https://t.co/XS32M6EPty

2

2

2

0

227

3 months ago

Verified on @ArtificialAnlys - check out the performance stats here: https://t.co/Xq9QwmTYG9

1

1

0

0

119

Last Seen Users on Sotwe

Trends for you

Most Popular Users