@edandersen Need help choosing my local model for Hermes Agent.
• Gemma 4 31B (Q6_K_M) • Qwen 3.6 27B (Q6_K_M) • Ornith 31B (Q6_K_M)
Mostly coding, terminal workflows, and agentic tasks. Running locally on a 4090.
@MiaAI_lab Need help choosing my local model for Hermes Agent.
• Gemma 4 31B (Q4_K_M) • Qwen 3.5 27B (Q4_K_M) • Ornith 31B (Q4_K_M)
Mostly coding, terminal workflows, and agentic tasks. Running locally on a 4090.
@ornith_ Need help choosing my local model for Hermes Agent.
• Gemma 4 31B (Q4_K_M) • Qwen 3.5 27B (Q4_K_M) • Ornith 31B (Q4_K_M)
Mostly coding, terminal workflows, and agentic tasks. Running locally on a 4090.
Has anyone compared these side by side?
Most company search tools are great at the names everyone already knows and fall apart past them. The value is rarely in the top 50 results. It's in the 200 small companies nobody indexed well. Filters find the obvious. The long tail needs real retrieval.
Everyone is building autonomous SDRs.
Few are building feedback loops.
Without loops, it's spam.
With loops, it's a revenue engine.
The moat isn't the model.
The moat is the loop.
@jasonzhou1993 The shared artifact layer is the part that matters most here. Once agents work off a shared store instead of each other's raw output, the loop stops drifting
@maximelabonne Picking 5 of 151 tools is a retrieval problem, not a reasoning one. Tools are just another corpus. Filtering them in beats dumping all 151 in context and hoping the model picks right.
@jacobli99@lateinteraction The "study during the exam" part is the interesting bit. If the corpus is always there at test time, expertise becomes knowing what to pull and when. For a non-expert agent the hard part isn't memory, it's finding the right passage when it's needed. More retrieval than learning.
@lateinteraction most long-context benchmarks test whether the model can use a full window. The harder, real failure is one step earlier, did the right chunk even make it in, that's what recall@k catches
@_avichawla Faster k-means matters for retrieval more than it sounds. k-means is how IVF and PLAID indexes get built, and that build is why re-indexing is a slow batch job. Make the clustering cheap and exact, and "rebuild the index" stops being something you schedule overnight.
Weird debate. Plenty of great companies were built by people who couldn't write a line of code. What makes a product yours was never who typed it. It's who owns the problem, who talks to users, and who fixes it when it breaks. The keyboard was never the hard part.
@Blum_OG One thing nobody says: once you stop being the loop, you become the spec. The loop will happily hit a sloppy "done" in ways you didn't mean. Writing a "done" the agent can't cheat is harder than the prompting it replaces.
@PawelHuryn The fact that you can swap models freely says it all — the model is a commodity now. But a loop is only a moat if it's closed. Most companies have logs, not loops. If yesterday's fixes don't change today's output, that's just storage, not learning.