Edit

Share via


Foundry Models sold directly by Azure

This article lists a selection of Azure AI Foundry Models sold directly by Azure along with their capabilities, deployment types, and regions of availability, excluding deprecated and legacy models. Models sold directly by Azure include all Azure OpenAI models and specific, selected models from top providers.

Depending on the kind of project you use in Azure AI Foundry, you see a different selection of models. Specifically, if you use a Foundry project built on an Azure AI Foundry resource, you see the models that are available for standard deployment to a Foundry resource. Alternatively, if you use a hub-based project hosted by an Azure AI Foundry hub, you see models that are available for deployment to managed compute and serverless APIs. These model selections often overlap because many models support multiple deployment options.

To learn more about attributes of Foundry Models sold directly by Azure, see Explore Azure AI Foundry Models.

Note

Foundry Models sold directly by Azure also include select models from the following top model providers:

  • Black Forest Labs: FLUX.1-Kontext-pro, FLUX-1.1-pro
  • DeepSeek: DeepSeek-V3.1, DeepSeek-V3-0324, DeepSeek-R1-0528, DeepSeek-R1
  • Meta: Llama-4-Maverick-17B-128E-Instruct-FP8, Llama-3.3-70B-Instruct
  • Microsoft: MAI-DS-R1
  • Mistral: mistral-document-ai-2505
  • xAI: grok-code-fast-1, grok-3, grok-3-mini, grok-4-fast-reasoning, grok-4-fast-non-reasoning, grok-4

To learn about these models, switch to Other model collections at the top of this article.

Azure OpenAI in Azure AI Foundry models

Azure OpenAI is powered by a diverse set of models with different capabilities and price points. Model availability varies by region and cloud. For Azure Government model availability, refer to Azure OpenAI in Azure Government.

Models Description
GPT-5 series NEW gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat
gpt-oss NEW open-weight reasoning models
codex-mini Fine-tuned version of o4-mini.
GPT-4.1 series gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
model-router A model that intelligently selects from a set of underlying chat models to respond to a given prompt.
computer-use-preview An experimental model trained for use with the Responses API computer use tool.
o-series models Reasoning models with advanced problem solving and increased focus and capability.
GPT-4o, GPT-4o mini, and GPT-4 Turbo Capable Azure OpenAI models with multimodal versions, which can accept both text and images as input.
GPT-4 A set of models that improve on GPT-3.5 and can understand and generate natural language and code.
GPT-3.5 A set of models that improve on GPT-3 and can understand and generate natural language and code.
Embeddings A set of models that can convert text into numerical vector form to facilitate text similarity.
Image generation A series of models that can generate original images from natural language.
Video generation A model that can generate original video scenes from text instructions.
Audio A series of models for speech to text, translation, and text to speech. GPT-4o audio models support either low latency speech in, speech out conversational interactions or audio generation.

GPT-5

Region availability

Model Region
gpt-5 (2025-08-07) See the models table.
gpt-5-mini (2025-08-07) See the models table.
gpt-5-nano (2025-08-07) See the models table.
gpt-5-chat (2025-08-07) See the models table.
gpt-5-codex (2025-09-11) East US2 (Global Standard) and Sweden Central (Global Standard)

Access will be granted based on Microsoft's eligibility criteria. Customers who previously applied and received access to o3, don't need to reapply as their approved subscriptions will automatically be granted access upon model release.

Model ID Description Context Window Max Output Tokens Training Data (up to)
gpt-5 (2025-08-07) - Reasoning
- Chat Completions API.
- Responses API.
- Structured outputs.
- Text and image processing.
- Functions, tools, and parallel tool calling.
- Full summary of capabilities.
400,000

Input: 272,000
Output: 128,000
128,000 October 24, 2024
gpt-5-mini (2025-08-07) - Reasoning
- Chat Completions API.
- Responses API.
- Structured outputs.
- Text and image processing.
- Functions, tools, and parallel tool calling.
- Full summary of capabilities.
400,000

Input: 272,000
Output: 128,000
128,000 June 24, 2024
gpt-5-nano (2025-08-07) - Reasoning
- Chat Completions API.
- Responses API.
- Structured outputs.
- Text and image processing.
- Functions, tools, and parallel tool calling.
- Full summary of capabilities.
400,000

Input: 272,000
Output: 128,000
128,000 May 31, 2024
gpt-5-chat (2025-08-07)
Preview
- Chat Completions API.
- Responses API.
- Input: Text/Image
- Output: Text only
128,000 16,384 October 24, 2024
gpt-5-codex (2025-09-11) - Responses API only.
- Input: Text/Image
- Output: Text only
- Structured outputs.
- Text and image processing.
- Functions, tools, and parallel tool calling.
- Full summary of capabilities
- Optimized for Codex CLI & Codex VS Code extension
400,000

Input: 272,000
Output: 128,000
128,000 -

gpt-oss

Region availability

Model Region
gpt-oss-120b All Azure OpenAI regions

Capabilities

Model ID Description Context Window Max Output Tokens Training Data (up to)
gpt-oss-120b (Preview) - Text in/text out only
- Chat Completions API
- Streaming
- Function calling
- Structured outputs
- Reasoning
- Available for deployment1 and via managed compute
131,072 131,072 May 31, 2024
gpt-oss-20b (Preview) - Text in/text out only
- Chat Completions API
- Streaming
- Function calling
- Structured outputs
- Reasoning
- Available via managed compute and Foundry Local
131,072 131,072 May 31, 2024

1 Unlike other Azure OpenAI models gpt-oss-120b requires an Azure AI Foundry project to deploy the model.

Deploy with code

az cognitiveservices account deployment create \
  --name "Foundry-project-resource" \
  --resource-group "test-rg" \
  --deployment-name "gpt-oss-120b" \
  --model-name "gpt-oss-120b" \
  --model-version "1" \
  --model-format "OpenAI-OSS" \
  --sku-capacity 10 \
  --sku-name "GlobalStandard"

GPT-4.1 series

Region availability

Model Region
gpt-4.1 (2025-04-14) See the models table.
gpt-4.1-nano (2025-04-14) See the models table.
gpt-4.1-mini (2025-04-14) See the models table.

Capabilities

Important

A known issue is affecting all GPT 4.1 series models. Large tool or function call definitions that exceed 300,000 tokens will result in failures, even though the 1 million token context limit of the models wasn't reached.

The errors can vary based on API call and underlying payload characteristics.

Here are the error messages for the Chat Completions API:

  • Error code: 400 - {'error': {'message': "This model's maximum context length is 300000 tokens. However, your messages resulted in 350564 tokens (100 in the messages, 350464 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

  • Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.description': string too long. Expected a string with maximum length 1048576, but got a string with length 2778531 instead.", 'type': 'invalid_request_error', 'param': 'tools[0].function.description', 'code': 'string_above_max_length'}}

Here's the error message for the Responses API:

  • Error code: 500 - {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through an Azure support request at: https://go.microsoft.com/fwlink/?linkid=2213926 if you keep seeing this error. (Please include the request ID d2008353-291d-428f-adc1-defb5d9fb109 in your email.)', 'type': 'server_error', 'param': None, 'code': None}}
Model ID Description Context window Max output tokens Training data (up to)
gpt-4.1 (2025-04-14) - Text and image input
- Text output
- Chat completions API
- Responses API
- Streaming
- Function calling
- Structured outputs (chat completions)
- 1,047,576
- 128,000 (provisioned managed deployments)
- 300,000 (batch deployments)
32,768 May 31, 2024
gpt-4.1-nano (2025-04-14) - Text and image input
- Text output
- Chat completions API
- Responses API
- Streaming
- Function calling
- Structured outputs (chat completions)
- 1,047,576
- 128,000 (provisioned managed deployments)
- 300,000 (batch deployments)
32,768 May 31, 2024
gpt-4.1-mini (2025-04-14) - Text and image input
- Text output
- Chat completions API
- Responses API
- Streaming
- Function calling
- Structured outputs (chat completions)
- 1,047,576
- 128,000 (provisioned managed deployments)
- 300,000 (batch deployments)
32,768 May 31, 2024

model-router

A model that intelligently selects from a set of underlying chat models to respond to a given prompt.

Region availability

Model Region
model-router (2025-08-07) East US 2 (Global Standard & Data Zone Standard), Sweden Central (Global Standard & Data Zone Standard)
model-router (2025-05-19) East US 2 (Global Standard & Data Zone Standard), Sweden Central (Global Standard & Data Zone Standard)

Billing for Data Zone Standard model router deployments will begin no earlier than November 1, 2025.

Capabilities

Model ID Description Context window Max output tokens Training data (up to)
model-router (2025-08-07) A model that intelligently selects from a set of underlying models to respond to a given prompt. 200,000 32,768 (GPT-4.1 series)
100,000 (o4-mini)
128,000 (gpt-5 reasoning models)
16,384 (gpt-5-chat)
-
model-router (2025-05-19) A model that intelligently selects from a set of underlying chat models to respond to a given prompt. 200,000 32,768 (GPT-4.1 series)
100,000 (o4-mini)
May 31, 2024

Larger context windows are compatible with some of the underlying models. That means an API call with a larger context succeeds only if the prompt happens to be routed to the right model. Otherwise, the call fails.

computer-use-preview

An experimental model trained for use with the Responses API computer use tool.

It can be used with third-party libraries to allow the model to control mouse and keyboard input, while getting context from screenshots of the current environment.

Caution

We don't recommend using preview models in production. We'll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don't follow the standard Azure OpenAI model lifecycle.

Registration is required to access computer-use-preview. Access is granted based on Microsoft's eligibility criteria. Customers who have access to other limited access models still need to request access for this model.

To request access, go to computer-use-preview limited access model application. When access is granted, you need to create a deployment for the model.

Region availability

Model Region
computer-use-preview See the models table.

Capabilities

Model ID Description Context window Max output tokens Training data (up to)
computer-use-preview (2025-03-11) Specialized model for use with the Responses API computer use tool

- Tools
- Streaming
- Text (input/output)
- Image (input)
8,192 1,024 October 2023

o-series models

The Azure OpenAI o-series models are designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, and math, compared to previous iterations.

Model ID Description Max request (tokens) Training data (up to)
codex-mini (2025-05-16) Fine-tuned version of o4-mini.
- Responses API.
- Structured outputs.
- Text and image processing.
- Functions and tools.
Full summary of capabilities.
Input: 200,000
Output: 100,000
May 31, 2024
o3-pro (2025-06-10) - Responses API.
- Structured outputs.
- Text and image processing.
- Functions and tools.
Full summary of capabilities.
Input: 200,000
Output: 100,000
May 31, 2024
o4-mini (2025-04-16) - New reasoning model, offering enhanced reasoning abilities.
- Chat Completions API.
- Responses API.
- Structured outputs.
- Text and image processing.
- Functions and tools.
Full summary of capabilities.
Input: 200,000
Output: 100,000
May 31, 2024
o3 (2025-04-16) - New reasoning model, offering enhanced reasoning abilities.
- Chat Completions API.
- Responses API.
- Structured outputs.
- Text and image processing.
- Functions, tools, and parallel tool calling.
Full summary of capabilities.
Input: 200,000
Output: 100,000
May 31, 2024
o3-mini (2025-01-31) - Enhanced reasoning abilities.
- Structured outputs.
- Text-only processing.
- Functions and tools.
Input: 200,000
Output: 100,000
October 2023
o1 (2024-12-17) - Enhanced reasoning abilities.
- Structured outputs.
- Text and image processing.
- Functions and tools.
Input: 200,000
Output: 100,000
October 2023
o1-preview (2024-09-12) Older preview version. Input: 128,000
Output: 32,768
October 2023
o1-mini (2024-09-12) A faster and more cost-efficient option in the o1 series, ideal for coding tasks that require speed and lower resource consumption.
- Global Standard deployment available by default.
- Standard (regional) deployments are currently only available for select customers who received access as part of the o1-preview limited access release.
Input: 128,000
Output: 65,536
October 2023

To learn more about advanced o-series models, see Getting started with reasoning models.

Region availability

Model Region
codex-mini East US2 & Sweden Central (Global Standard).
o3-pro East US2 & Sweden Central (Global Standard).
o4-mini See the models table.
o3 See the models table.
o3-mini See the models table.
o1 See the models table.
o1-preview See the models table. This model is available only for customers who were granted access as part of the original limited access.
o1-mini See the models table.

GPT-4o and GPT-4 Turbo

GPT-4o integrates text and images in a single model, which enables it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English language tasks and vision tasks, setting new benchmarks for AI capabilities.

How do I access the GPT-4o and GPT-4o mini models?

GPT-4o and GPT-4o mini are available for Standard and Global Standard model deployment.

You need to create or use an existing resource in a supported Standard or Global Standard region where the model is available.

When your resource is created, you can deploy the GPT-4o models. If you're performing a programmatic deployment, the model names are:

  • gpt-4o version 2024-11-20
  • gpt-4o version 2024-08-06
  • gpt-4o version 2024-05-13
  • gpt-4o-mini version 2024-07-18

GPT-4 Turbo

GPT-4 Turbo is a large multimodal model (accepting text or image inputs and generating text) that can solve difficult problems with greater accuracy than any of OpenAI's previous models. Like GPT-3.5 Turbo, and older GPT-4 models, GPT-4 Turbo is optimized for chat and works well for traditional completions tasks.

GPT-4

GPT-4 is the predecessor to GPT-4 Turbo. Both the GPT-4 and GPT-4 Turbo models have a base model name of gpt-4. You can distinguish between the GPT-4 and Turbo models by examining the model version.

GPT-4 and GPT-4 Turbo models

These models can be used only with the Chat Completions API.

See Model versions to learn about how Azure OpenAI handles model version upgrades. See Working with models to learn how to view and configure the model version settings of your GPT-4 deployments.

Model ID Description Max request (tokens) Training data (up to)
gpt-4o (2024-11-20)
GPT-4o (Omni)
- Structured outputs.
- Text and image processing.
- JSON Mode.
- Parallel function calling.
- Enhanced accuracy and responsiveness.
- Parity with English text and coding tasks compared to GPT-4 Turbo with Vision.
- Superior performance in non-English languages and in vision tasks.
- Enhanced creative writing ability.
Input: 128,000
Output: 16,384
October 2023
gpt-4o (2024-08-06)
GPT-4o (Omni)
- Structured outputs.
- Text and image processing.
- JSON Mode.
- Parallel function calling.
- Enhanced accuracy and responsiveness.
- Parity with English text and coding tasks compared to GPT-4 Turbo with Vision.
- Superior performance in non-English languages and in vision tasks.
Input: 128,000
Output: 16,384
October 2023
gpt-4o-mini (2024-07-18)
GPT-4o mini
- Fast, inexpensive, capable model ideal for replacing GPT-3.5 Turbo series models.
- Text and image processing.
- JSON Mode.
- Parallel function calling.
Input: 128,000
Output: 16,384
October 2023
gpt-4o (2024-05-13)
GPT-4o (Omni)
- Text and image processing.
- JSON Mode.
- Parallel function calling.
- Enhanced accuracy and responsiveness.
- Parity with English text and coding tasks compared to GPT-4 Turbo with Vision.
- Superior performance in non-English languages and in vision tasks.
Input: 128,000
Output: 4,096
October 2023
gpt-4 (turbo-2024-04-09)
GPT-4 Turbo with Vision
New generally available model.
- Replacement for all previous GPT-4 preview models (vision-preview, 1106-Preview, 0125-Preview).
- Feature availability is currently different, depending on the method of input and the deployment type.
Input: 128,000
Output: 4,096
December 2023

Caution

We don't recommend that you use preview models in production. We'll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don't follow the standard Azure OpenAI model lifecycle.

GPT-3.5

GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is GPT-3.5 Turbo, which is optimized for chat and also works well for traditional completions tasks. GPT-3.5 Turbo is available for use with the Chat Completions API. GPT-3.5 Turbo Instruct has similar capabilities to text-davinci-003 when you use the Completions API instead of the Chat Completions API. We recommend using GPT-3.5 Turbo and GPT-3.5 Turbo Instruct over legacy GPT-3.5 and GPT-3 models.

Model ID Description Max request (tokens) Training data (up to)
gpt-35-turbo (0125) new - JSON Mode.
- Parallel function calling.
- Reproducible output (preview).
- Higher accuracy when it responds in requested formats.
- Includes a fix for a bug that caused a text-encoding issue for non-English language function calls.
Input: 16,385
Output: 4,096
Sep 2021
gpt-35-turbo (1106) Older generally available model.
- JSON Mode.
- Parallel function calling.
- Reproducible output (preview).
Input: 16,385
Output: 4,096
Sep 2021
gpt-35-turbo-instruct (0914) Completions endpoint only.
- Replacement for legacy completions models.
4,097 Sep 2021

To learn more about how to interact with GPT-3.5 Turbo and the Chat Completions API, check out our in-depth how-to article.

Embeddings

text-embedding-3-large is the latest and most capable embedding model. You can't upgrade between embeddings models. To move from using text-embedding-ada-002 to text-embedding-3-large, you need to generate new embeddings.

  • text-embedding-3-large
  • text-embedding-3-small
  • text-embedding-ada-002

OpenAI reports that testing shows that both the large and small third generation embeddings models offer better average multi-language retrieval performance with the MIRACL benchmark. They still maintain performance for English tasks with the MTEB benchmark.

Evaluation benchmark text-embedding-ada-002 text-embedding-3-small text-embedding-3-large
MIRACL average 31.4 44.0 54.9
MTEB average 61.0 62.3 64.6

The third generation embeddings models support reducing the size of the embedding via a new dimensions parameter. Typically, larger embeddings are more expensive from a compute, memory, and storage perspective. When you can adjust the number of dimensions, you gain more control over overall cost and performance. The dimensions parameter isn't supported in all versions of the OpenAI 1.x Python library. To take advantage of this parameter, we recommend that you upgrade to the latest version: pip install openai --upgrade.

OpenAI's MTEB benchmark testing found that even when the third generation model's dimensions are reduced to less than the 1,536 dimensions of text-embeddings-ada-002, performance remains slightly better.

Image generation models

The image generation models generate images from text prompts that the user provides. GPT-image-1 is in limited access preview. DALL-E 3 is generally available for use with the REST APIs. DALL-E 2 and DALL-E 3 with client SDKs are in preview.

Registration is required to access gpt-image-1. Access is granted based on Microsoft's eligibility criteria. Customers who have access to other limited access models still need to request access for this model.

To request access, go to gpt-image-1 limited access model application. When access is granted, you need to create a deployment for the model.

Region availability

Model Region
dall-e-3 East US
Australia East
Sweden Central
gpt-image-1 West US 3 (Global Standard)
East US 2 (Global Standard)
UAE North (Global Standard)
Poland Central (Global Standard)

Video generation models

Sora is an AI model from OpenAI that can create realistic and imaginative video scenes from text instructions. Sora is in preview.

Region availability

Model Region
sora East US 2 (Global Standard)
Sweden Central(Global Standard)

Audio models

Audio models in Azure OpenAI are available via the realtime, completions, and audio APIs.

GPT-4o audio models

The GPT-4o audio models are part of the GPT-4o model family and support either low-latency, speech in, speech out conversational interactions or audio generation.

Caution

We don't recommend using preview models in production. We'll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don't follow the standard Azure OpenAI model lifecycle.

Details about maximum request tokens and training data are available in the following table:

Model ID Description Max request (tokens) Training data (up to)
gpt-4o-mini-audio-preview (2024-12-17)
GPT-4o audio
Audio model for audio and text generation. Input: 128,000
Output: 16,384
September 2023
gpt-4o-audio-preview (2024-12-17)
GPT-4o audio
Audio model for audio and text generation. Input: 128,000
Output: 16,384
September 2023
gpt-4o-realtime-preview (2025-06-03)
GPT-4o audio
Audio model for real-time audio processing. Input: 128,000
Output: 4,096
October 2023
gpt-4o-realtime-preview (2024-12-17)
GPT-4o audio
Audio model for real-time audio processing. Input: 128,000
Output: 4,096
October 2023
gpt-4o-mini-realtime-preview (2024-12-17)
GPT-4o audio
Audio model for real-time audio processing. Input: 128,000
Output: 4,096
October 2023
gpt-realtime (2025-08-28) (GA)
GPT-4o audio
Audio model for real-time audio processing. Input: 28,672
Output: 4,096
October 2023

To compare the availability of GPT-4o audio models across all regions, refer to the models table.

Audio API

The audio models via the /audio API can be used for speech to text, translation, and text to speech.

Speech-to-text models

Model ID Description Max request (audio file size)
whisper General-purpose speech recognition model. 25 MB
gpt-4o-transcribe Speech-to-text model powered by GPT-4o. 25 MB
gpt-4o-mini-transcribe Speech-to-text model powered by GPT-4o mini. 25 MB

Speech translation models

Model ID Description Max request (audio file size)
whisper General-purpose speech recognition model. 25 MB

Text-to-speech models (preview)

Model ID Description
tts Text-to-speech model optimized for speed.
tts-hd Text-to-speech model optimized for quality.
gpt-4o-mini-tts Text-to-speech model powered by GPT-4o mini.

You can guide the voice to speak in a specific style or tone.

For more information, see Audio models region availability later in this article.

Model summary table and region availability

Models by deployment type

Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment:

  • Standard: Has a global deployment option, routing traffic globally to provide higher throughput.
  • Provisioned: Also has a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.

All deployments can perform the exact same inference operations, but the billing, scale, and performance are substantially different. To learn more about Azure OpenAI deployment types, see our Deployment types guide.

Global Standard model availability

Region gpt-5, 2025-08-07 gpt-5-mini, 2025-08-07 gpt-5-nano, 2025-08-07 gpt-5-chat, 2025-08-07 o3-pro, 2025-06-10 codex-mini, 2025-05-16 sora, 2025-05-02 model-router, 2025-08-07 model-router, 2025-05-19 o3, 2025-04-16 o4-mini, 2025-04-16 gpt-image-1, 2025-04-15 gpt-4.1, 2025-04-14 gpt-4.1-nano, 2025-04-14 gpt-4.1-mini, 2025-04-14 computer-use-preview, 2025-03-11 o3-mini, 2025-01-31 o1, 2024-12-17 o1-mini, 2024-09-12 gpt-4o, 2024-05-13 gpt-4o, 2024-08-06 gpt-4o, 2024-11-20 gpt-4o-mini, 2024-07-18 gpt-4, turbo-2024-04-09 text-embedding-3-small, 1 text-embedding-3-large, 1 text-embedding-ada-002, 2 gpt-4o-realtime-preview, 2024-12-17 gpt-4o-realtime-preview, 2025-06-03 gpt-4o-audio-preview, 2024-12-17 gpt-4o-mini-realtime-preview, 2024-12-17 gpt-4o-mini-audio-preview, 2024-12-17 gpt-4o-transcribe, 2025-03-20 gpt-4o-mini-tts, 2025-03-20 gpt-4o-mini-transcribe, 2025-03-20 gpt-5-codex, 2025-09-15 gpt-audio, 2025-08-28 gpt-realtime, 2025-08-28 o3-deep-research, 2025-06-26
australiaeast - - - - - - - - - - - - - - - - - - - - - -
brazilsouth - - - - - - - - - - - - - - - - - - - - - - - -
canadaeast - - - - - - - - - - - - - - - - - - - - - - -
eastus - - - - - - - - - - - - - - - - - - - - -
eastus2 -
francecentral - - - - - - - - - - - - - - - - - - - - - - - -
germanywestcentral - - - - - - - - - - - - - - - - - - - - - - - -
italynorth - - - - - - - - - - - - - - - - - - - - - - - - - - -
japaneast - - - - - - - - - - - - - - - - - - - - - -
koreacentral - - - - - - - - - - - - - - - - - - - - - -
northcentralus - - - - - - - - - - - - - - - - - - - - - - -
norwayeast - - - - - - - - - - - - - - - - - - - - - - -
polandcentral - - - - - - - - - - - - - - - - - - - - - - -
southafricanorth - - - - - - - - - - - - - - - - - - - - - - - -
southcentralus - - - - - - - - - - - - - - - - - - - - - - -
southindia - - - - - - - - - - - - - - - - - - - - -
spaincentral - - - - - - - - - - - - - - - - - - - - - - - -
swedencentral - - -
switzerlandnorth - - - - - - - - - - - - - - - - - - - - - -
uaenorth - - - - - - - - - - - - - - - - - - - - - - -
uksouth - - - - - - - - - - - - - - - - - - - - - -
westeurope - - - - - - - - - - - - - - - - - - - - - - - -
westus - - - - - - - - - - - - - - - - - - - - - -
westus3 - - - - - - - - - - - - - - - - - - - - - -

Note

o3-deep-research is currently only available with Azure AI Foundry Agent Service. To learn more, see the Deep Research tool guidance.

This table doesn't include fine-tuning regional availability information. Consult the fine-tuning section for this information.

Standard deployment (regional) models by endpoint

Chat completions

Region o1-preview, 2024-09-12 o1-mini, 2024-09-12 gpt-4o, 2024-05-13 gpt-4o, 2024-11-20 gpt-4o, 2024-08-06 gpt-4o-mini, 2024-07-18 gpt-4, turbo-2024-04-09 gpt-35-turbo, 1106 gpt-35-turbo, 0125
australiaeast - - - - - -
canadaeast - - - - - -
eastus -
eastus2 -
francecentral - - - - - -
japaneast - - - - - - -
northcentralus -
norwayeast - - - - - - - -
southcentralus -
southindia - - - - - -
swedencentral
switzerlandnorth - - - - - - -
uksouth - - - - - -
westeurope - - - - - - - -
westus
westus3 -

Note

o1-mini is currently available to all customers for Global Standard deployment.

Select customers were granted standard (regional) deployment access to o1-mini as part of the o1-preview limited access release. At this time, access to o1-mini standard (regional) deployments isn't being expanded.

GPT-4 and GPT-4 Turbo model availability

GPT-3.5 models

To learn about how Azure OpenAI handles model version upgrades, see Model versions. To learn how to view and configure the model version settings of your GPT-3.5 Turbo deployments, see Working with models.

Fine-tuning models

Note

gpt-35-turbo: Fine-tuning of this model is limited to a subset of regions, and isn't available in every region the base model is available.

The supported regions for fine-tuning might vary if you use Azure OpenAI models in an Azure AI Foundry project versus outside a project.

Model ID Standard training regions Global training Max request (tokens) Training data (up to) Modality
gpt-35-turbo
(1106)
East US2
North Central US
Sweden Central
Switzerland West
- Input: 16,385
Output: 4,096
Sep 2021 Text to text
gpt-35-turbo
(0125)
East US2
North Central US
Sweden Central
Switzerland West
- 16,385 Sep 2021 Text to text
gpt-4o-mini
(2024-07-18)
North Central US
Sweden Central
Input: 128,000
Output: 16,384
Training example context length: 65,536
Oct 2023 Text to text
gpt-4o
(2024-08-06)
East US2
North Central US
Sweden Central
Input: 128,000
Output: 16,384
Training example context length: 65,536
Oct 2023 Text and vision to text
gpt-4.1
(2025-04-14)
North Central US
Sweden Central
Input: 128,000
Output: 16,384
Training example context length: 65,536
May 2024 Text and vision to text
gpt-4.1-mini
(2025-04-14)
North Central US
Sweden Central
Input: 128,000
Output: 16,384
Training example context length: 65,536
May 2024 Text to text
gpt-4.1-nano (2025-04-14) North Central US
Sweden Central
Input: 128,000
Output: 16,384
Training example context length: 32,768
May 2024 Text to text
o4-mini
(2025-04-16)
East US2
Sweden Central
- Input: 128,000
Output: 16,384
Training example context length: 65,536
May 2024 Text to text

Note

Global training provides more affordable training per token, but doesn't offer data residency. It's currently available to Azure OpenAI resources in the following regions:

  • Australia East
  • Brazil South
  • Canada Central
  • Canada East
  • East US
  • East US2
  • France Central
  • Germany West Central
  • Italy North
  • Japan East (no vision support)
  • Korea Central
  • North Central US
  • Norway East
  • Poland Central (no 4.1-nano support)
  • Southeast Asia
  • South Africa North
  • South Central US
  • South India
  • Spain Central
  • Sweden Central
  • Switzerland West
  • Switzerland North
  • UK South
  • West Europe
  • West US
  • West US3

Assistants (preview)

For Assistants, you need a combination of a supported model and a supported region. Certain tools and capabilities require the latest models. The following models are available in the Assistants API, SDK, and Azure AI Foundry. The following table is for standard deployment. For information on provisioned throughput unit availability, see Provisioned throughput. The listed models and regions can be used with both Assistants v1 and v2. You can use Global Standard models if they're supported in the following regions.

Region gpt-4o, 2024-05-13 gpt-4o, 2024-08-06 gpt-4o-mini, 2024-07-18 gpt-4, 0613 gpt-4, 1106-Preview gpt-4, 0125-Preview gpt-4, turbo-2024-04-09 gpt-4-32k, 0613 gpt-35-turbo, 0613 gpt-35-turbo, 1106 gpt-35-turbo, 0125 gpt-35-turbo-16k, 0613
australiaeast - - - - -
eastus - - - -
eastus2 - - - -
francecentral - - - - - -
japaneast - - - - - - - - -
norwayeast - - - - - - - - - - -
southindia - - - - - - - - -
swedencentral - -
uksouth - - - - - -
westus - - - - -
westus3 - - - - - -

Model retirement

For the latest information on model retirements, refer to the model retirement guide.

Note

Foundry Models sold directly by Azure also include all Azure OpenAI models. To learn about these models, switch to the Azure OpenAI models collection at the top of this article.

Black Forest Labs models sold directly by Azure

The Black Forest Labs (BFL) collection of image generation models includes FLUX.1 Kontext [pro] for in-context generation and editing and FLUX1.1 [pro] for text-to-image generation.

You can run these models through the BFL service provider API and through the images/generations and images/edits endpoints.

Model Type & API endpoint Capabilities Deployment type (region availability) Project type
FLUX.1-Kontext-pro Image generation
- Image API: https://<resource-name>/openai/deployments/{deployment-id}/images/generations
and
https://<resource-name>/openai/deployments/{deployment-id}/images/edits

- BFL service provider API: <resource-name>/providers/blackforestlabs/v1/flux-kontext-pro?api-version=preview
- Input: text and image (5,000 tokens and 1 image)
- Output: One Image
- Tool calling: No
- Response formats: Image (PNG and JPG)
- Key features: Character consistency, advanced editing
- Additional parameters: (In provider-specific API only) seed, aspect ratio, input_image, prompt_unsampling, safety_tolerance, output_format, webhook_url, webhook_secret
- Global standard (all regions) Foundry, Hub-based
FLUX-1.1-pro Image generation
- Image API: https://<resource-name>/openai/deployments/{deployment-id}/images/generations

- BFL service provider API: <resource-name>/providers/blackforestlabs/v1/flux-pro-1.1?api-version=preview
- Input: text (5,000 tokens and 1 image)
- Output: One Image
- Tool calling: No
- Response formats: Image (PNG and JPG)
- Key features: Fast inference speed, strong prompt adherence, competitive pricing, scalable generation
- Additional parameters: (In provider-specific API only) width, height, prompt_unsampling, seed, safety_tolerance, output_format, webhook_url, webhook_secret
- Global standard (all regions) Hub-based

See this model collection in Azure AI Foundry portal.

DeepSeek models sold directly by Azure

The DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks by using a step-by-step training process, such as language, scientific reasoning, and coding tasks.

Model Type Capabilities Deployment type (region availability) Project type
DeepSeek-V3.1 chat-completion
(with reasoning content)
- Input: text (131,072 tokens)
- Output: (131,072 tokens)
- Languages: en and zh
- Tool calling: Yes
- Response formats: Text, JSON
- Global standard (all regions) Foundry, Hub-based
DeepSeek-R1-0528 chat-completion
(with reasoning content)
- Input: text (163,840 tokens)
- Output: (163,840 tokens)
- Languages: en and zh
- Tool calling: No
- Response formats: Text.
- Global standard (all regions)
- Global provisioned (all regions)
Foundry, Hub-based
DeepSeek-V3-0324 chat-completion - Input: text (131,072 tokens)
- Output: (131,072 tokens)
- Languages: en and zh
- Tool calling: Yes
- Response formats: Text, JSON
- Global standard (all regions)
- Global provisioned (all regions)
Foundry, Hub-based
DeepSeek-R1 chat-completion
(with reasoning content)
- Input: text (163,840 tokens)
- Output: (163,840 tokens)
- Languages: en and zh
- Tool calling: No
- Response formats: Text.
- Global standard (all regions)
- Global provisioned (all regions)
Foundry, Hub-based

See this model collection in Azure AI Foundry portal.

Meta models sold directly by Azure

Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models. Meta models range in scale to include:

  • Small language models (SLMs) like 1B and 3B Base and Instruct models for on-device and edge inferencing
  • Mid-size large language models (LLMs) like 7B, 8B, and 70B Base and Instruct models
  • High-performance models like Meta Llama 3.1-405B Instruct for synthetic data generation and distillation use cases.
Model Type Capabilities Deployment type (region availability) Project type
Llama-4-Maverick-17B-128E-Instruct-FP8 chat-completion - Input: text and images (1M tokens)
- Output: text (1M tokens)
- Languages: ar, en, fr, de, hi, id, it, pt, es, tl, th, and vi
- Tool calling: No
- Response formats: Text
- Global standard (all regions) Foundry, Hub-based
Llama-3.3-70B-Instruct chat-completion - Input: text (128,000 tokens)
- Output: text (8,192 tokens)
- Languages: en, de, fr, it, pt, hi, es, and th
- Tool calling: No
- Response formats: Text
- Global standard (all regions) Foundry, Hub-based

See this model collection in Azure AI Foundry portal. You can also find several Meta models available from partners and community.

Microsoft models sold directly by Azure

Microsoft models include various model groups such as MAI models, Phi models, healthcare AI models, and more. To see all the available Microsoft models, view the Microsoft model collection in Azure AI Foundry portal.

Model Type Capabilities Deployment type (region availability) Project type
MAI-DS-R1 chat-completion
(with reasoning content)
- Input: text (163,840 tokens)
- Output: (163,840 tokens)
- Languages: en and zh
- Tool calling: No
- Response formats: Text.
- Global standard (all regions) Foundry, Hub-based

See the Microsoft model collection in Azure AI Foundry portal. You can also find several Microsoft models available from partners and community.

Mistral models sold directly by Azure

Model Type Capabilities Deployment type (region availability) Project type
mistral-document-ai-2505 Image-to-Text - Input: image or PDF pages (30 pages, max 30MB PDF file)
- Output: text
- Languages: en
- Tool calling: no
- Response formats: Text, JSON, Markdown
- Global standard (all regions) Foundry

See the Mistral model collection in Azure AI Foundry portal. You can also find several Mistral models available from partners and community.

xAI models sold directly by Azure

xAI's Grok models in Azure AI Foundry Models include a diverse set of models designed to excel in various enterprise domains with different capabilities and price points, including:

  • Grok 3, a non-reasoning model pretrained by the Colossus datacenter, is tailored for business use cases such as data extraction, coding, and text summarization, with exceptional instruction-following capabilities. It supports a 131,072 token context window, allowing it to handle extensive inputs while maintaining coherence and depth, and is adept at drawing connections across domains and languages.

  • Grok 3 Mini is a lightweight reasoning model trained to tackle agentic, coding, mathematical, and deep science problems with test-time compute. It also supports a 131,072 token context window for understanding codebases and enterprise documents, and excels at using tools to solve complex logical problems in novel environments, offering raw reasoning traces for user inspection with adjustable thinking budgets.

  • Grok Code Fast 1, a fast and efficient reasoning model designed for use in agentic coding applications. It was pretrained on a coding-focused data mixture, then post-trained on demonstrations of various coding tasks and tool use as well as demonstrations of correct refusal behaviors based on xAI's safety policy. Registration is required for access to the grok-code-fast-1 model.

  • Grok 4 Fast, an efficiency-optimized language model that delivers near-Grok 4 reasoning capabilities with significantly lower latency and cost, and can bypass reasoning entirely for ultra-fast applications. It is trained for safe and effective tool use, with built-in refusal behaviors, a fixed safety-enforcing system prompt, and input filters to prevent misuse.

  • Grok 4 is the latest reasoning model from xAI with advanced reasoning and tool-use capabilities, enabling it to achieve new state-of-the-art performance across challenging academic and industry benchmarks. Registration is required for access to the grok-4 model.

Model Type Capabilities Deployment type (region availability) Project type
grok-4 chat-completion - Input: text, image (256,000 tokens)
- Output: text (8,192 tokens)
- Languages: en
- Tool calling: yes
- Response formats: text
- Global standard (all regions) Foundry, Hub-based
grok-4-fast-reasoning chat-completion - Input: text, image (2,000,000 tokens)
- Output: text (2,000,000 tokens)
- Languages: en
- Tool calling: yes
- Response formats: text
- Global standard (all regions) Foundry, Hub-based
grok-4-fast-non-reasoning chat-completion - Input: text, image (2,000,000 tokens)
- Output: text (2,000,000 tokens)
- Languages: en
- Tool calling: yes
- Response formats: text
- Global standard (all regions) Foundry, Hub-based
grok-code-fast-1 chat-completion - Input: text (256,000 tokens)
- Output: text (8,192 tokens)
- Languages: en
- Tool calling: yes
- Response formats: text
- Global standard (all regions) Foundry, Hub-based
grok-3 chat-completion - Input: text (131,072 tokens)
- Output: text (131,072 tokens)
- Languages: en
- Tool calling: yes
- Response formats: text
- Global standard (all regions)
- Data zone standard (US)
Foundry, Hub-based
grok-3-mini chat-completion - Input: text (131,072 tokens)
- Output: text (131,072 tokens)
- Languages: en
- Tool calling: yes
- Response formats: text
- Global standard (all regions)
- Data zone standard (US)
Foundry, Hub-based

See the xAI model collection in Azure AI Foundry portal.

Model region availability by deployment type

Foundry Models gives you choices for the hosting structure that fits your business and usage patterns. The service offers two main types of deployment:

  • Standard: Has a global deployment option, routing traffic globally to provide higher throughput.
  • Provisioned: Also has a global deployment option, allowing you to purchase and deploy provisioned throughput units across Azure global infrastructure.

All deployments perform the same inference operations, but the billing, scale, and performance differ. For more information about deployment types, see Deployment types in Azure AI Foundry Models.

Global Standard model availability

Region DeepSeek-R1-0528 DeepSeek-R1 DeepSeek-V3-0324 DeepSeek-V3.1 FLUX.1-Kontext-pro FLUX-1.1-pro grok-4 grok-4-fast-reasoning grok-4-fast-non-reasoning grok-code-fast-1 grok-3 grok-3-mini Llama-4-Maverick-17B-128E-Instruct-FP8 Llama-3.3-70B-Instruct MAI-DS-R1 mistral-document-ai-2505
australiaeast
brazilsouth
canadaeast
eastus
eastus2
francecentral
germanywestcentral
italynorth
japaneast
koreacentral
northcentralus
norwayeast
polandcentral
southafricanorth
southcentralus
southindia
spaincentral
swedencentral
switzerlandnorth
switzerlandwest
uaenorth
uksouth
westeurope
westus
westus3

Open and custom models

The model catalog offers a larger selection of models from a wider range of providers. For these models, you can't use the option for standard deployment in Azure AI Foundry resources, where models are provided as APIs. Instead, to deploy these models, you might need to host them on your infrastructure, create an AI hub, and provide the underlying compute quota to host the models.

Furthermore, these models can be open-access or IP protected. In both cases, you have to deploy them in managed compute offerings in Azure AI Foundry. To get started, see How-to: Deploy to Managed compute.