Skip to main content
OpenAI SDK compatible

How task-aware routing works

Each request is classified by task type (code, planning, writing, etc.) and routed to a model optimized for that task. No manual model selection required.

The problem with hardcoding models

  • • model: "gpt-4" everywhere
  • • No fallback on errors
  • • Manual retry logic
  • • 429 rate limit errors
  • • Provider outages break loops
  • • High latency for simple tasks
  • • Overpaying for formatting
  • • Wrong model for the task
  • • No visibility into costs

ModelPilot: task classification + model routing in one API.

Why routing matters

Different tasks need different models. Code → DeepSeek. Writing → Claude. Simple tasks → GPT-4o-mini.

Cost efficiency
GPT-4 for formatting is ~100x more expensive than GPT-4o-mini. Route simple tasks to cheap models, save budget for complex ones.
Latency optimization
Reasoning models (o1, o3) are slow. Route simple tasks to fast models like Haiku or Flash. Total loop time drops.
Task-specific models
DeepSeek and Codestral beat GPT-4 on code. Claude beats GPT-4 on writing. Use the right model for the task.

How it works

Four-step pipeline: classify, match, execute, fallback.

1. Task classification

Prompt is analyzed and classified into a task type: code generation, planning, summarization, extraction, creative writing, Q&A, etc.

2. Model matching

Task type is mapped to models with strong benchmark scores for that category. Code → DeepSeek/Codestral. Writing → Claude. Reasoning → o1/GPT-4.

3. Weighted selection

Final model is selected based on your router weights: cost, latency, quality, and optionally carbon impact. Configurable per router.

4. Automatic fallbacks

If the model returns an error (429, 500, timeout), the request is retried with a fallback model. Configurable retry count and escalation chain.

What you get

Benefits of task-aware routing over hardcoded model selection.

Lower costs
Route simple tasks to cheap models automatically. Pay for GPT-4 only when you need GPT-4.
Lower latency
Fast models for simple tasks. Total agent loop time decreases when you stop waiting for reasoning models on formatting steps.
Better outputs
DeepSeek for code, Claude for writing, GPT-4 for reasoning. Each task gets the model that's best at it.
Higher reliability
Automatic fallbacks when providers fail. Your agent loop doesn't break on 429s or provider outages.

Start routing in 2 minutes

Create a router, change your baseURL, done. Free tier includes $5 in credits.

npm install modelpilot • OpenAI SDK compatible