Model router for Microsoft Foundry

Note

This document refers to the Microsoft Foundry (classic) portal.

Note

This document refers to the Microsoft Foundry (new) portal.

Model router for Microsoft Foundry is a deployable AI chat model that is trained to select the best large language model (LLM) to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. Thus, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.

Tip

The Microsoft Foundry (new) portal offers enhanced configuration options for model router. Switch to the Microsoft Foundry (new) documentation to see the latest features.

Why use model router?

Model router intelligently selects the best underlying model for a given prompt to optimize costs while maintaining quality. Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks. Also, reasoning models are available for tasks that require complex reasoning, and non-reasoning models are used otherwise. Model router provides a single deployment and chat experience that combines the best features from all of the underlying chat models.

The latest version, 2025-11-18 adds several capabilities:

Support Global Standard and Data Zone Standard deployments.
Adds support for new models: grok-4, grok-4-fast-reasoning, DeepSeek-V3.1, gpt-oss-120b, Llama-4-Maverick-17B-128E-Instruct-FP8, gpt-4o, gpt-4o-mini, claude-haiku-4-5, claude-opus-4-1, and claude-sonnet-4-5.
Support for agentic scenarios including tools so you can now use it in the Foundry Agent service.
Quick deploy or Custom deploy with routing mode and model subset selections.
Routing mode: Optimize the routing logic for your needs. Supported options: Quality, Cost, Balanced (default).
Model subset: Select models to create your model subset for routing.

Versioning

Each version of model router is associated with a specific set of underlying models and their versions. This set is fixed—only newer versions of model router can expose new underlying models.

If you select Auto-update at the deployment step (see Manage models), then your model router model automatically updates when new versions become available. When that happens, the set of underlying models also changes, which could affect the overall performance of the model and costs.

Underlying models

Model router version	Underlying models	Underlying model version
`2025-11-18`	`gpt-4.1` `gpt-4.1-mini` `gpt-4.1-nano` `o4-mini` `gpt-5-nano` `gpt-5-mini` `gpt-5` `gpt-5-chat` `Deepseek-v3.1` `gpt-oss-120b` `llama4-maverick-instruct` `grok-4` `grok-4-fast` `gpt-4o` `gpt-4o-mini` `claude-haiku-4-5` `claude-opus-4-1` `claude-sonnet-4-5`	`2025-04-14` `2025-04-14` `2025-04-14` `2025-04-16` `2025-08-07` `2025-08-07` `2025-08-07` `2025-08-07` N/A N/A N/A N/A N/A `2024-11-20` `2024-07-18` `2025-10-01` `2025-08-05` `2025-09-29`
`2025-08-07`	`gpt-4.1` `gpt-4.1-mini` `gpt-4.1-nano` `o4-mini` `gpt-5` `gpt-5-mini` `gpt-5-nano` `gpt-5-chat`	`2025-04-14` `2025-04-14` `2025-04-14` `2025-04-16` `2025-08-07` `2025-08-07` `2025-08-07` `2025-08-07`
`2025-05-19`	`gpt-4.1` `gpt-4.1-mini` `gpt-4.1-nano` `o4-mini`	`2025-04-14` `2025-04-14` `2025-04-14` `2025-04-16`

Routing mode

With the latest version, if you choose custom deployment, you can select the routing mode to optimize for quality or cost while maintaining a baseline level of performance. Setting a routing profile is optional, and if you don’t set one, your deployment defaults to the balanced strategy.

Use routing mode if you:

Need to reduce spend while retaining near-maximum quality.
Need consistent access to the highest-quality model for critical workloads.
Want to A/B test quality vs. cost trade-offs through per-request overrides.

Available routing modes

Mode	Description
Balanced (default)	Considers both cost and quality dynamically. Perfect for general-purpose scenarios
Quality	Prioritizes for maximum accuracy. Best for complex reasoning or critical outputs
Cost	Prioritizes for more cost savings. Ideal for high-volume, budget-sensitive workloads

Model subset

The latest version of model router supports model subsets: For custom deployments, you can specify which underlying models to include in routing decisions. This gives you more control over cost, compliance, and performance characteristics.

When new base models become available, they're not included in your selection unless you explicitly add them to your deployment's inclusion list.

Limitations

Resource limitations

See the Models page for the region availability and deployment types for model router.

Technical limitations

See Quotas and limits for rate limit information.

To overcome the limits on context window and parameters, use the Model subset feature to select your models for routing that support your desired properties.

Note

The context window limit listed on the Models page is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:

Summarize the prompt before passing it to the model
Truncate the prompt into more relevant parts
Use document embeddings and have the chat model retrieve relevant sections: see Azure AI Search

Model router accepts image inputs for Vision enabled chats (all of the underlying models can accept image input), but the routing decision is based on the text input only.

Model router doesn't process audio input.

Billing information

Starting November 2025, the model router usage will be charged for input prompts at the rate listed on the pricing page.

You can monitor the costs of your model router deployment in the Azure portal.

Next step

How to use model router

Feedback

Was this page helpful?

Last updated on 2025-11-19

Share via

Model router for Microsoft Foundry

Why use model router?

Versioning

Underlying models

Routing mode

Available routing modes

Model subset

Limitations

Resource limitations

Technical limitations

Billing information

Next step

Feedback

Additional resources