How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.
Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.
Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.
Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.
Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.
Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.
The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.
Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.
FOUNDERS: You've been paying to build on Claude. @AnthropicAI launched a program to change that.
@Claudeai for Startups - free API credits and priority rate limits for early-stage VC-backed founders:
- Free Claude API credits
- Highest rate limits, no throttling in production
- Hackathons, Founder Days, and meetups
- Early access to new model releases
Build with the full Claude stack:
Claude API, Claude Code, Claude Managed Agents, and Claude Cowork.
To qualify: your startup must be early-stage and backed by one of Anthropic's partner VCs. Ask your investors for a unique application link.
Apply → https://t.co/HckO4O93Ho
P.S. Founders using Claude to build - when you're ready to raise, @ThePageform is where your data room lives → https://t.co/RgOL0J0kt5
The homies from @rarible just went live with their Gacha on Solana, so we cracked some packs 🔥
They also were so kind to give me 2 x $25 packs to give away!
This is how you can win 👇
- retweet this tweet ♻️
- create an account through the link in the comments 🎁
- comment your wallet address💭
Drawing winners in 24 hours!
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
JUST IN: Zcash crashes 48% after Claude AI finds critical vulnerability allowing unlimited minting of $ZEC.
It went unnoticed for 4 years until it was patched on June 1st.
Anthropic's Head of Product just dropped a 28-minute masterclass on agent production.
Prompt caching. Tool search. Programmatic tool calling. Compaction. Advisor strategy.
28 minutes. Free. Worth more than 100 YouTube videos combined.
Watch it first.
Then read this.
The masterclass teaches you how agents work.
This teaches you what to build with them — a 5-agent content pipeline that does the work of a $300K creative team.
Full pipeline below ↓
Bookmark this. Start this weekend.
YTD, $VVV is up ~9x and has become one of the most promising projects in the decentralized AI category.
❶ It addresses the core pain points of centralized AI providers:
- 100% privacy (no data retention or surveillance)
- Uncensored, bias-free open-source models
- Agent-optimized economics
- Inference at zero marginal cost, or mint $DIEM for perpetual credits
- Scalability + crypto alignment
- One API, 200+ models
❷ How it works:
- Stake VVV.
- Receive a pro-rata share of total platform capacity, perpetually.
- Stake 1% of all staked VVV → use 1% of inference for free (no per-token fees).
- Mint DIEM by locking your staked VVV (1 DIEM ≈ $1/day of credits, forever).
- Stake your DIEM → get automatic $1/day refreshing credits every epoch (00:00 UTC).
- DIEM is tradable as an ERC-20 token.
❸ The Ecosystem
Many decentralized AI projects used Venice as a core inference layer and migrated their workloads.
- Strike Robot / $SR / @StrikeRobot_ai
Inference API backend for robotics. Enables private VLM reasoning, vision, and decision-making for humanoids.
- Warden / $WARD / @wardenprotocol
Migrates onchain AI agent workloads to Venice models. Runs Warden App & Studio for private agents.
- Dolphin / $POD / @dphnAI
Team behind Dolphin Mistral 24B Venice Edition powering Venice Uncensored. Trained on Bittensor’s @TargonCompute and fully integrated as Venice’s private/uncensored model.
- Morpheus / $MOR / @MorpheusAIs
Decentralized p2p AI compute network Venice launched on. Foundation for private, permissionless inference.
- Bonfires AI / $KNOW (exp. Q2 2026) / @bonfiresai
AI agents for collective memory, knowledge graphs, and “judge” agents. Runs core reasoning/judging on Venice models (notably Dolphin-powered Venice Uncensored). Helped judge a Venice hackathon.
- Hyperbolic Labs / @hyperbolic_labs
On-demand GPU rentals + OpenAI-compatible inference + scalable clusters (H100/H200). Official AI inference infra partner.
- Fleek / $FLK / @fleek
“Shopify for AI agents”: decentralized hosting, deployment, and monetization for agents + virtual influencers. One-click launch and earn.
- BasedAI / @basedai_co
New ecosystem layer from Venice’s co-founder. Flagship @gethirebased brings persistent multi-agent automation. They acquired the full Warden App IP, orchestration stack, and builder team.
---
The Venice AI crypto ecosystem is one of the most vertically integrated plays in decentralized AI right now.
The ecosystem is still early and the flywheel is simple.
More agents and users → higher private inference demand → more VVV staking and DIEM usage → revenue buybacks/burns → stronger token economics.