Built a local agentic stack where Hermes orchestrates, Codex 5.3 designs workflows, and DGX Spark runs the model team.
Telegram is the front door.
Hermes turns messages into tasks.
Codex helps build and refine the pipelines.
Local models execute natively on Spark.
Current local lineup:
โข gemma-4-26B-A4B
โข qwen3.5-9b
โข qwen3.5-4b
So the flow is:
Telegram โ gateway channel โ Hermes โ Codex-assisted workflow logic โ local harness โ DGX Spark inference โ validated response
Not just chatbot plumbing.
More like building an operating system for coordinated intelligence.
@gospaceport@Ominousind Definitely. I watched your Threadripper Pro video today, and your points about the rules of homelabbing particularly around upgrading and replacing components strategically. Definitely there upsides with GPUs
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B โ the #1 trending model on HuggingFace โ as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are โ it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.