Edit

Share via


How to generate chat completions with Azure AI Foundry Models

This article explains how to generate chat completions using next generation v1 Azure OpenAI APIs.

Prerequisites

To use chat completion models in your application, you need:

v1 Azure OpenAI APIs

The next generation v1 Azure OpenAI APIs let you use the OpenAI() client in the official OpenAI client libraries across languages instead of the AzureOpenAI() client. The v1 Azure OpenAI APIs add support for:

  • Ongoing access to the latest features, with no need to frequently specify new values for the api-version parameter.
  • OpenAI client support with minimal code changes to swap between OpenAI and Azure OpenAI when using key-based authentication.
  • OpenAI client support for token-based authentication and automatic token refresh without the need to take a dependency on a separate Azure OpenAI client.
  • Chat completions calls with Foundry Models from providers like DeepSeek and Grok, which support the v1 chat completions syntax.

For more information on the v1 Azure OpenAI APIs, see API evolution and the v1 OpenAPI 3.0 spec.

Generate chat completions

For Azure OpenAI in Foundry Models, use the Responses API to make chat completion calls. For other Foundry Models sold directly by Azure, such as DeepSeek and Grok models, the v1 Azure OpenAI API also allows you to make chat completion calls by using the v1 chat completions syntax.

In the following examples, you create the client to consume the model and then send a basic request to the model.

Note

Use keyless authentication with Microsoft Entra ID. If that's not possible, use an API key and store it in Azure Key Vault. You can use an environment variable for testing outside of your Azure environments. To learn more about keyless authentication, see What is Microsoft Entra authentication? and DefaultAzureCredential.

Use the responses API

Python v1 examples.

API key authentication:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"
)

response = client.responses.create(   
  model="gpt-4.1-nano", # Replace with your deployment name 
  input="This is a test.",
)

print(response.model_dump_json(indent=2)) 

Notice the following details of the previous code:

  • Uses the OpenAI() client instead of the deprecated AzureOpenAI() client.
  • Passes the Azure OpenAI endpoint appended with /openai/v1/ as the base_url.
  • Doesn't have to provide the api-version parameter with the v1 GA API.
  • Sets the model parameter to the underlying deployment name you chose when you deployed the model. This name isn't the same as the name of the model you deployed.

To use the API key with environment variables set for OPENAI_BASE_URL and OPENAI_API_KEY:

client = OpenAI()

Microsoft Entra authentication:

Microsoft Entra authentication only supports Azure OpenAI resources. Complete the following steps:

  1. Install the Azure Identity client library:

    pip install azure-identity
    
  2. Use the following code to configure the OpenAI client object, specify your deployment, and generate responses.

    import os
    from openai import OpenAI
    from azure.identity import DefaultAzureCredential, get_bearer_token_provider
    
    token_provider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
    )
    
    client = OpenAI(  
      base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
      api_key = token_provider  
    )
    
    response = client.responses.create(
        model ="gpt-4.1-nano",  # Replace with your deployment name 
        input = "This is a test" 
    )
    
    print(response.model_dump_json(indent=2)) 
    

    Notice the following details of the previous code:

    • Uses the OpenAI() client instead of the deprecated AzureOpenAI() client.
    • Passes the Azure OpenAI endpoint appended with /openai/v1/ as the base_url.
    • Sets the api_key parameter to token_provider. This setting enables automatic retrieval and refresh of an authentication token instead of using a static API key.
    • Doesn't have to provide the api-version parameter with the v1 GA API.
    • Sets the model parameter to the underlying deployment name you chose when you deployed the model. This name isn't the same as the name of the model you deployed.

Use the chat completions API

For Azure OpenAI in Foundry Models, use the Responses API. However, for other Foundry Models from providers like DeepSeek and Grok, the v1 API allows you to make chat completions calls, as these models support the OpenAI v1 chat completions syntax.

base_url accepts both https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/ and https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/ formats.

API key authentication:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)

completion = client.chat.completions.create(
  model="grok-3-mini", # Replace with your model deployment name.
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "When was Microsoft founded?"}
  ]
)

#print(completion.choices[0].message)
print(completion.model_dump_json(indent=2))

Microsoft Entra authentication:

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)
completion = client.chat.completions.create(
  model="grok-3-mini", # Replace with your model deployment name.
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me about the attention is all you need paper"}
  ]
)

#print(completion.choices[0].message)
print(completion.model_dump_json(indent=2))