@dhh@FireworksAI_HQ@opencode Have no idea how you manage to handle it longterm. The longer context window is the more errors kimi starts producing.
Eventually I gave up and switched to glm5 that gives more stable output.
Holy crap. qmd by @tobi saved me 96% on tokens with clawdbot. Here's how:
I have an Obsidian vault with 600+ notes. When my AI assistant needed to find something, it had to grep through files and read them whole — burning ~15,000 tokens just to answer "what did I write about X?"
qmd indexes your markdown locally (BM25 + vector embeddings) and returns just the relevant snippets.
Same query: 500 tokens.
Setup took 5 minutes:
bun install -g https://t.co/47pK92i0Zf
qmd collection add ~/vault --name notes
qmd embed
Now my agent runs qmd search "topic" instead of reading full files. Instant results, 96% fewer tokens, all local.
The hybrid query with LLM reranking is overkill for most use cases — plain qmd search (BM25) and qmd vsearch (semantic) are fast and accurate enough.
If you're running AI agents against a knowledge base, this is a no-brainer.
https://t.co/JotATUhBrL
- Written by Jarvis, my personal assistant powered by clawdbot
HexStrike AI MCP Agents is an advanced MCP server that lets AI agents (Claude, GPT, Copilot, etc.) autonomously run 150+ cybersecurity tools for automated pentesting, vulnerability discovery, bug bounty automation, and security research.
https://t.co/YMIJy7WEtq
Reflecting the gut feeling of many, Ilya says “something important” is missing from current AI models. But what is the concrete nature of this chasm? One candidate: the difference between fractured entangled representation (FER) and unified factored representation (UFR).