Different models have different strengths. Some are generally agreed upon (Claude Sonnet for front-end), but there’s a lot of overlap and subjectivity. Some models are completely overkill and are too powerful for everyday tasks. You don’t need o3 (LINK) to reason for ten seconds if you just want the internal temperature of a medium-rare steak.

Model providers usually give descriptions of each model’s strengths, but there is a ton of overlap, and they just aren’t that descriptive. For example:

GPT-4o :Fast, intelligent, flexible GPT model

GPT-4.1 mini: Balanced for intelligence, speed, and cost

In a perfect world you’d use the “best” model for each job, but that’s easier said than done. You can lean on benchmark data and wire up if/else logic, but benchmarks often don’t align with user preferences.

In this article, we’ll survey the leading LLM-routing paradigms: task-based, performance-based, and rule-driven. We’ll also dive into Arch-Router, a recent 1.5 B-parameter generative approach. Let’s dive in!

Different types of LLM routers

Generally speaking, there are 4 different types of LLM routers.

1. Intent/Embedding-based routers

Create embeddings for each user message and run a semantic similarity vector search against a fixed set of topics (e.g. “billing,” “SQL,” “math”). The closest intent determines which model handles the request.

  • Examples:
    • OrchestraLLM retrieves the 𝑘 most similar dialogue examples by embedding similarity, then routes by majority vote among expert models
    • Custom in-house pipelines
  • Pros & Cons:
    • ✅ Fast and easy to prototype.
    • ❌ Brittle to topic drift and multi-turn context, and requires retraining whenever you add or redefine intents.

2. Cost/Performance-based routers

Use benchmark or cost–accuracy data to train a router that decides if a cheaper model works for a query or if it should escalate to a stronger, more expensive one. These routers tend to focus on cutting costs and not using a large model for every task

  • Examples
  • Pros & Cons:
    • ✅ Optimizes spend versus quality in controlled tasks.
    • ❌ Ignores subjective criteria (tone, style, brand)

3. Rule-driven Routers

Hard-coded if/else logic that map queries to models in some way.

  • Example
    • Custom implementation
  • Pros & Cons
    • ✅ Ultra-low latency and full transparency
    • ❌ Maintenance nightmare at scale as use cases and models expand

4. Preference-aligned routers

This one will be the major topic of today, as this is what Arch-Router does. Users write route policies in plain-English, pair with model choices. A small LLM ingests these policies and the user message(s) and return the policy that best matches each query.

  • Examples:
    • Arch-Router
  • Pros & Cons
    • ✅ Human-interpretable, adapts immediately to new policies without retraining, and handles multi-turn drift gracefully.
    • ⚠️ Requires a lightweight generative model and well-crafted policy descriptions to work effectively.

Introducing Arch-Router

Arch-Router is a lightweight, 1.5B-parameter model that routes user queries to user-defined models by following plain-English route policies, rather than benchmarks or if/else rules. Each policy is a simple (identifier, description) tuple, for example ("legal_review", "Analyze a contract clause…"). A separate lookup table maps each identifier to its chosen model (e.g. legal_review → GPT-4o-mini). More examples below.

Under the hood, Arch-Router is a fine-tuned Qwen 2.5 (1.5 B) model. After training on a mix of clean and noisy policy–dialogue data, its routing accuracy jumps from about 20.7 % off-the-shelf to over 93%.

table showing accuracy results across models

Since Arch-Router is so small and fine-tuned for the task, it only adds ~50ms of latency, on average. The next-fastest commercial router (Gemini-2.0-flash-lite) takes about 510 ± 82 ms, and Claude-sonnet-3.7 takes 1,450 ± 385 ms.

Table showing latency across models

How Arch Router works

At its core, Arch-Router routes queries to a given LLM by following human-written policies. Here’s how it works, step-by-step:

Define policies

A really simple document that lays out a set of (identifier, description) tuples, and the related models.

C = {
 (“code_gen”,“Generate code snippets or boilerplate.”),
 (“summarize”,“Produce a concise summary of this text.”),
 (“hotel_search”,“Find and recommend hotels in a given city.”),
 (“default”,“Fallback for any other queries.”)
}
T(code_gen)     = Claude-sonnet-3.7
T(summarize)    = GPT-4o
T(hotel_search) = Gemma-3
T(default)      = Qwen2.5-4B

Compose the router prompt

  • The policies
  • The conversation history
  • The new user query + some extra instructions (see prompt below)

You are a helpful assistant designed to find the best suited
route.
You are provided with route description within
<routes></routes> XML tags:


<routes>
\n{routes}\n
</routes>


<conversation>
\n{conversation}\n
</conversation>Your task is to decide which route is
best suit with user intent on the conversation in
<conversation></conversation> XML tags.
Follow the instruction:
1. If the latest intent from user is irrelevant or user
intent is full filled, respond with other route {"route":
"other"}.
2. Analyze the route descriptions and find the best match
route for user latest intent.
3. Respond only with the route name that best matches the
user’s request, using the exact name in the <routes> block.


Based on your analysis, provide your response in the
following JSON format if you decide to match any route:


{"route":  "route_name"}

Generate and dispatch

Arch-Router ingests the user’s prompt and outputs one of the policy identifiers (e.g. code_gen). That identifier is handed off to a mapping function which looks up the LLM you mapped to that given policy, and the request is sent.

Since the policy lives in the prompt itself, it is really easy to add, remove, and edit routes and models.

Conclusion

The problem of using the right model for the right job isn’t going anywhere, and will probably only get more confusing in the future. While it isn’t possible to implement Arch-Router directly into Claude or ChatGPT, maybe in the future there will be easier ways to set up smarter routing for whole organizations.

Arch-Router seems to be the best router I’ve seen given how flexible and aligned it is with actual human preferences and how minimal of a latency hit there is.

Headshot of PromptHub Co-founder Dan Cleary
Dan Cleary
Founder