Model routing is theoretically quite simple, in practice very difficult and currently just barely doable but not doable well.
What you need for routing is:
1. MODEL: A profile of its capabilities, like tool calling, vision, etc
2. ENDPOINT: A profile of latency, reliability and downtime, cost per token, etc
3. TELEMETRY: Data on how models and endpoints (note, the same model can perform differently and have different pricing on different endpoints) have performed on different tasks, historically
4. TASK: Information on the nature of the task request (does it need vision, complex tool calls, is it math or coding or prose, etc)
5. ROLES: Assign models to roles, where a role is responsible for a group of tasks.
Then you need routing strategies like cheapest, or privacy, performance, or coding, or legal, etc. Ideally you should be able to set this per task, or application, or by user (some have higher or lower allowances; some deal with sensitive data that shouldnt leak).
In practice, routing is difficult because different models have different chat templates etc so that routing between them can throw errors. Different models do thinking differently, use tools differently etc.
Inference providers also do not send sufficient data for routing decisions, for example task = "code", complexity = "easy", so you would need a routing model running which adds latency and also reduces accuracy.
The short version is that routing could be a simple task, if model makers and inference providers were aligned with the need for smart, precision routing.
For this, I've built role-model (https://t.co/pcbZdFy1pD), a routing protocol and a router runtime. Currently works and is in alpha but I need to clean up the UI before I do a proper release.
Dynamic model routing products have largely been snake oil so far. We’ve seen many come and go since 2022.
The story of model routing has a simple, legible quality that magnetizes capital.
@Alfred_Lin’s “Beware of Simple Narratives” speaks to the danger of this: https://t.co/STqtEeQP0z
I’m an engineer who has been working on genAI applications since 2022. The nuanced reality is very different from the simple story:
1. As @sqs points out below, frontier models are often better, faster AND cheaper—because they don’t have to retry or get stuck in reasoning loops. The gains of cost-optimized routing are often minimal. Also: people generally want the best possible output. People want to pay 20% more for 5% better.
2. Many projects take a concert of tightly bound models and prompts to complete well. You don’t want individual tasks being routed to different models, as it makes a system unpredictable and unstable. You care about the performance of the aggregate system much more than individual task performance. Dynamic task routing makes it hard to measure the system as a whole.
3. As a user, I dislike how model routing makes software feel opaque. I want to be able to get a “feel” for each model and how to best use it. I don’t want to use a system where changing one word of my prompt might cause me to get routed to a different model, getting wildly different results.
4. Foundation model APIs are already doing model routing to some extent. If there is a significant model arbitrage opportunity which can save costs, they can close the arbitrage themselves.
Contrary to what various "design experts" say, nobody in China likes super apps.
These apps do not provide more functionality to users; in fact they are not designed for users in the first place.
I've worked for one of the largest e-commerce platforms in China. We had both an international version and a domestic version of our app. Due to its influence by the Chinese version of the app, both were designed by the same teams after all, the international version took on the characteristics of the Chinese app: colorful, cluttered, and so on.
This led to international users in our user research saying that our app looked like a cheap discount retailer or even like a gambling or scam website because of the overly colorful interface, frequent promotion banners and pop ups and such (removing one of these increased order conversions by 30%!)
None of that was surprising, because what our UI communicated was precisely an identity of a platform that doesn't care about you as a user but strove to push you to click buttons, download things, and pay ASAP. What you may find more surprising is that when I looked at the user feedback from the Chinese version of the app, the top three complaints were:
1. cluttered, unclear interface;
2. too many promotions:
3. complex, hard-to-read information.
So, just the same as our EUUS users. What's more, users in Thailand, Japan and South Korea said the same thing. In fact, in our competitor benchmarking, users in Southeast Asia preferred the minimal interface of one of our US competitors.
So, let's once and for all lay to rest the idea that "Asian users like different interfaces." This entire explanation is based on nothing more than the circular argument that "because Taobao exists, it means users like it, which is why it's designed this way."
The reason why platform apps that in Asia are referred to as "super apps" look the way they do has nothing to do with user preferences but everything with business strategy:
- Apps are not designed for users
- Interface design is not decided by designers
- While product managers puch certain directions, it's often not by choice
- KPIs are handed down by the business to product and design teams
- Because platforms have a captive audience, the business can design for itself without regard for the user, and will always choose to add more CTAs, more information, more buttons, and so on
https://t.co/HszRsmX3AR