Summary
Introduce a semantic routing layer that classifies incoming requests by intent using embeddings, before applying the existing complexity-based LLM policy.
The goal is to improve routing accuracy by first understanding the nature of the task, not only its complexity.
Problem
The current routing flow can estimate request complexity, but it does not explicitly identify the task domain.
This can lead to suboptimal model selection in cases such as:
- SQL or data tasks routed to generic models
- rewrite / extraction tasks sent to overly expensive models
- tool-oriented tasks not separated from general chat
Goal
Add a new routing policy able to:
- classify a request into a semantic intent
- map that intent to a candidate model pool
- pass the narrowed pool to the existing complexity policy
- support fallback for unknown or ambiguous classifications
Proposed behavior
High-level flow:
- Receive user request
- Generate embedding for the request text
- Compare it against predefined intent embeddings
- Select the most relevant intent
- Resolve the candidate model pool for that intent
- Run the existing LLM complexity policy only within that pool
- Select final model
Initial intents
Suggested initial intent categories:
- coding
- sql_data
- tool_use
- reasoning
- general_chat
- rewrite_extract
Functional requirements
- Add a new routing policy:
semantic_intent
- Support pluggable embedding providers
- Support local and remote embedding backends
- Store intent definitions and example utterances in config
- Compute similarity between request embedding and intent embeddings
- Return:
- top intent
- top score
- second-best intent
- margin between first and second
- classification status:
confident, ambiguous, unknown
- Map each intent to a candidate model pool
- Allow fallback behavior for
ambiguous and unknown
Configuration
Example configuration:
{
"routing": {
"policy": "semantic_intent",
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"absolute_threshold": 0.60,
"ambiguity_threshold": 0.08,
"fallback_policy": "llm_complexity"
},
"intents": {
"coding": {
"examples": [
"write a python function to reverse a linked list",
"fix this typescript bug",
"generate unit tests for this class"
],
"candidate_models": ["qwen-coder", "claude-sonnet", "claude-opus"]
},
"sql_data": {
"examples": [
"write a sql query to find top customers by revenue",
"fix this postgres join",
"convert this question into sql"
],
"candidate_models": ["deepseek-chat", "claude-sonnet", "claude-opus"]
}
}
}
Summary
Introduce a semantic routing layer that classifies incoming requests by intent using embeddings, before applying the existing complexity-based LLM policy.
The goal is to improve routing accuracy by first understanding the nature of the task, not only its complexity.
Problem
The current routing flow can estimate request complexity, but it does not explicitly identify the task domain.
This can lead to suboptimal model selection in cases such as:
Goal
Add a new routing policy able to:
Proposed behavior
High-level flow:
Initial intents
Suggested initial intent categories:
Functional requirements
semantic_intentconfident,ambiguous,unknownambiguousandunknownConfiguration
Example configuration:
{ "routing": { "policy": "semantic_intent", "embedding_provider": "openai", "embedding_model": "text-embedding-3-small", "absolute_threshold": 0.60, "ambiguity_threshold": 0.08, "fallback_policy": "llm_complexity" }, "intents": { "coding": { "examples": [ "write a python function to reverse a linked list", "fix this typescript bug", "generate unit tests for this class" ], "candidate_models": ["qwen-coder", "claude-sonnet", "claude-opus"] }, "sql_data": { "examples": [ "write a sql query to find top customers by revenue", "fix this postgres join", "convert this question into sql" ], "candidate_models": ["deepseek-chat", "claude-sonnet", "claude-opus"] } } }