Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: https://t.co/8cL321pVDh
Guide: https://t.co/odRo9WjRpA
RTX 5060 Ti 16GB. $429 GPU.
Last night I got 128 t/s on Qwen3.6-35B using ik_llama.cpp's R4 quant format. Crushing performance. Faster than the 5070 Ti on mainline llama.cpp. Performance stays consistent from 0 to 139k context and no speculative decoding used!🤯
Special thanks to @MakJoris for sharing ik_llama.cpp with us!
Today I wanted to know if it's actually *useful* at that speed. So I gave it a coding agent and 4 creative challenges.
Here's what it built. 🧵
I'd really like to submit an application to @xai for a cowork desktop application they could utilise or work with to bring @grok a desktop harness with your grok plan.
How could 1 achieve this?
10 GitHub repos to spend 60-90% less tokens in Claude Code:
1. RTK (Rust Token Killer)
CLI proxy that filters terminal output before it hits your context
- 60-90% reduction on common dev commands
- one binary, zero dependencies
- works with Claude Code, Cursor, Copilot
Repo: https://t.co/WayvpBtyBH
2. Context Mode
Sandboxes raw tool output into SQLite instead of dumping it into context
- 98% context reduction on Playwright, GitHub, logs
- only clean summaries enter your conversation
- works as Claude Code plugin
Repo: https://t.co/YNbFIGQz7X
3. code-review-graph
Local knowledge graph that maps your codebase with Tree-sitter
- Claude reads only what matters, not the entire repo
- 49x token reduction on large monorepos
- 6.8x on average reviews
Repo: https://t.co/9gIzmAWN12
4. Token Savior
MCP server that navigates code by symbols, not full files
- 97% reduction on code navigation
- persistent memory across sessions
- 69 tools, zero external deps
Repo: https://t.co/OtvhrMgGWh
5. Caveman Claude
makes Claude talk like a caveman to cut output tokens
- 65-75% output reduction
- one-line install
- keeps full technical accuracy
Repo: https://t.co/onBeghTyfH
6. claude-token-efficient
one CLAUDE.md file that keeps responses terse
- drop-in, no code changes
- reduces output verbosity on heavy workflows
- best for output-heavy sessions
Repo: https://t.co/j6MKo9klQe
7. token-optimizer-mcp
MCP server with caching, compression, and smart tool intelligence
- 95%+ token reduction through intelligent caching
- compresses repeated tool outputs
Repo: https://t.co/0jIVQ4ANls
8. claude-token-optimizer
reusable setup prompts for optimizing any project
- 90% token savings in 5 minutes
- reduces doc token usage from 11K to 1.3K
Repo: https://t.co/puil9WwFGB
9. token-optimizer
finds ghost tokens that silently eat your context
- survives compaction without losing quality
- fixes context quality decay
Repo: https://t.co/92G8e4yeGq
10. claude-context (by Zilliz)
code search MCP that makes your entire codebase the context
- ~40% reduction with equivalent retrieval quality
- hybrid BM25 + dense vector search
Repo: https://t.co/yjfiQOSy15
[ how to stack them ]:
you don't need all 10. pick 2-3 based on your workflow:
> heavy terminal output? RTK
> big codebase? code-review-graph + Token Savior
> lots of MCP servers? Context Mode
> quick fix? Caveman + claude-token-efficient
most people are burning tokens without knowing it
run /context in a fresh session and see how much is gone before you even type a word
your pocket will thank me later :<)
it's awake. the way you interact with the web, information, and services is being rewritten.
introducing FlowithOS — the world's first operating system natively built for ai agents. self-evolving. memory-powered. lightning-fast.
beyond any ai browser, it's the SMARTEST agentic os that turns your browser into real-world value, from assisting you to acting for you.
let's witness together ⬇️
@omni__ventures Inflation-adjusted total in 2026 dollars: ~$228 trillion.
But very interesting at a flat rate. Just goes to show how bad the system is. Won't last
Your P&L is a reflection of your psychology, not just your strategy.
If you keep breaking your rules, you don't need a new indicator, you need to fix your mental game.
Jared Tendler’s principles completely flip the script on trading psychology:
1. Awareness does not equal control.
2. Revenge trading is a neurological hijack.
3. You don't need optimism; you need presence.
Mastering these 10 concepts is the difference between a hobbyist and a professional.
Which number do you struggle with the most?