@liquidai's LFM2.5 models are now live on ZeroGPU.
Access LFM2.5-1.2B-Instruct and LFM2.5-1.2B-Thinking through our global edge inference network to run efficient small language models.
Get started today:
https://t.co/TrrVaPqshD
TokenMaxxxing is out!!
"Token efficiency is going to be a big theme this year… because the spend has been ramping up way faster than enterprise customers thought." @DavidSacks said this on the latest @theallinpod
Most AI tasks don’t need frontier-model reasoning.
Small language models are bridging that gap.
That’s what we’re building at @ZeroGPU_AI.
$700 billion is being spent on AI compute this year.
Today a city voted to pause that spend.
The buildout is hitting a wall — and most of what it’s being built for never needed a data center at all. 🧵
So we stopped trying to build a data center, and started on a solution.
An edge inference network built around idle compute.
Run repeatable work on small and nano language models. Frontier models stay for reasoning.
→ https://t.co/kYQMQq27pv
Use frontier models like Claude for orchestration and reasoning.
For the high-volume, repeatable tasks that most enterprises are tapping into AI for today, use specialized models to complete work faster, more predictably and at a lower cost.
https://t.co/TrrVaPqshD
Here's how to reduce costs & improve results: pair Claude Code w/ a specialized small language model.
In this example cookbook, our specialized SLM redacts PII within Claude Code.
Our router plugin lets Claude decide which tasks are pushed to our specialized, cheaper models.
Claude Code processes a customer feedback export, automatically hands PII extraction and redaction to purpose-built models that generates:
→ A clean version that's safe to share
→ A complete audit log of every PII entity found and removed
👩🍳Cookbook:
https://t.co/K6IsFvi8N1
Useful for for customer feedback, support tickets, extraction, classification & more.
⭐️Please consider leaving us a 5-star review on GitHub⭐️
https://t.co/IaZ2OScdZX
Our latest Claude Code cookbook is live.
It shows how to pair frontier models like Claude with specialized small and nano language models for high-volume, repeatable tasks.
In this case, we show how to redact PII info with Claude Code + our SLMs. https://t.co/K6IsFvi8N1
With the ZeroGPU Router plugin, Claude Code can automatically route these tasks to purpose-built models.
You stay in Claude Code.
The repetitive work gets handed off to specialized models.
Are your AI costs too high?
We’re giving developers access to a growing catalog of more efficient, specialized AI models through a single API—including leading open-source models like Meta’s Llama 3.1.
We’ve added Llama 3.1 8B Instruct, a great fit for:
→ Summarization
→ Content transformation
→ Classification
→ Data extraction
→ Customer support workflows
→ Lightweight chat and agent experiences
With our router, let AI decide which models you choose to save on costs.
Not every task you run in @Claude Code needs frontier-model reasoning. But most AI coding workflows are still sending every request to the largest model available.
That's why we built a new plug-in that that routes lightweight workloads to specialized nano language models.
This has been our most requested feature to-date, perfect for:
- data enrichment
- classification
- offline analytics
- backfills
- so much more
Get started: https://t.co/TrrVaPqshD
Our Batch API is built for AI workloads that do not need to happen in real time, helping you save on costs.
Instead of sending each request one by one:
upload a JSONL file
submit it as a batch job
retrieve the results when processing is complete
It’s a cleaner way to run large AI workloads without managing queues, workers, retries, or GPU infrastructure yourself. ZeroGPU handles the execution. You focus on the data.
Read more: https://t.co/5a5SRrzjTt
@ZeroGPU_AI Batch Processing has been our most asked feature. We are already providing ~5x cost savings for our customers compared to frontier models(case study with real customer coming soon).
With Batch processing there is an additional savings layer on top. This is awesome @ZeroGPU_AI