Model routing for coding agents must happen at the subagent point of delegation.
Doing so is the highest point of leverage for quality-neutral cost efficiency, and it will become more important as frontier models become better managers of deep agent hierarchies.
@Arindam_1729@davidvgilmore@steipete It's implemented as an AI gateway, yes. Just a one-link change and you can run Claude Code as a multi-model, multi-provider agent harness
Introducing Rayline: a model router built specifically for Claude Code.
Claude Code's API rates are 8-10x its subscription rates.
Most of those tokens are going to easy subtasks.
Plug in Rayline, and subagents get routed to open source and on-device models instead of burning Opus-level spend on grunt work.
Quality holds. Costs drop 60-90%.
What makes Rayline different:
- Routes at the subagent/subtask level
- On-device routing via MLX (Qwen 3.6 and others)
- Built-in ML router trained for Claude Code tasks
- Cloud fallback when Anthropic has outages
- Works with OpenAI models inside Claude Code
You keep using Claude Code exactly as you do today. Rayline handles the routing underneath.
We're already routing billions of tokens per day for individual developers and publicly traded companies.
Public beta is live now. Try it at Rayline[.]ai