Import a language model API

2025-07-08

APPLIES TO: All API Management tiers

You can import OpenAI-compatible language model endpoints to your API Management instance as APIs. You can also import language models that aren't compatible with OpenAI as passthrough APIs, which forward requests directly to the backend endpoints. For example, you might want to manage an LLM that you self-host, or that's hosted on an inference provider other than Azure AI services. Use AI gateway policies and other capabilities in API Management to simplify integration, improve observability, and enhance control over the model endpoints.

Learn more about managing AI APIs in API Management:

AI gateway capabilities in Azure API Management

Language model API types

API Management supports two types of language model APIs for this scenario. Choose the option suitable for your model deployment. The option determines how clients call the API and how the API Management instance routes requests to the AI service.

OpenAI-compatible - Language model endpoints that are compatible with OpenAI's API. Examples include certain models exposed by inference providers such as Hugging Face Text Generation Inference (TGI) and Google Gemini API.

For an OpenAI-compatible LLM, API Management configures a chat completions endpoint.
Passthrough - Other language model endpoints that aren't compatible with OpenAI's API. Examples include models deployed in Amazon Bedrock or other providers.

API Management configures wildcard operations for common HTTP verbs. Clients can append paths to the wildcard operations, and API Management passes requests to the backend.

Prerequisites

An existing API Management instance. Create one if you haven't already.
A self-hosted or non-Azure-provided language model deployment with an API endpoint.

Import language model API using the portal

When you import the LLM API in the portal, API Management automatically configures:

A backend resource and a set-backend-service policy that direct API requests to the LLM endpoint.
(optionally) Access to the LLM backend using an access key you provide. The key is protected as a secret named value in API Management.
(optionally) Policies to help you monitor and manage the API.

To import a language model API to API Management:

In the Azure portal, navigate to your API Management instance.
In the left menu, under APIs, select APIs > + Add API.
Under Define a new API, select Language Model API.
On the Configure API tab:
1. Enter a Display name and optional Description for the API.
2. Enter the URL to the LLM API endpoint.
3. Optionally select one or more Products to associate with the API.
4. In Path, append a path that your API Management instance uses to access the LLM API endpoints.
5. In Type, select either Create OpenAI API or Create a passthrough API. See Language model API types for more information.
6. In Access key, enter the authorization header name and API key used to access the LLM API, if required.
7. Select Next.
On the Manage token consumption tab, optionally enter settings or accept defaults that define the following policies to help monitor and manage the API:
- Manage token consumption
- Track token usage
On the Apply semantic caching tab, optionally enter settings or accept defaults that define the policies to help optimize performance and reduce latency for the API:
- Enable semantic caching of responses
On the AI content safety, optionally enter settings or accept defaults to configure the Azure AI Content Safety service to block prompts with unsafe content:
- Enforce content safety checks on LLM requests
Select Review.
After settings are validated, select Create.

API Management creates the API, and configures operations for the LLM endpoints. By default, the API requires an API Management subscription.

Test the LLM API

To ensure that your LLM API is working as expected, test it in the API Management test console.

Select the API you created in the previous step.
Select the Test tab.
Select an operation that's compatible with the model deployment. The page displays fields for parameters and headers.
Enter parameters and headers as needed. Depending on the operation, you might need to configure or update a Request body.

Note

In the test console, API Management automatically populates an Ocp-Apim-Subscription-Key header, and configures the subscription key of the built-in all-access subscription. This key enables access to every API in the API Management instance. Optionally display the Ocp-Apim-Subscription-Key header by selecting the "eye" icon next to the HTTP Request.
Select Send.

When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.

AI gateway capabilities in Azure API Management

Share via

Import a language model API

Language model API types

Prerequisites

Import language model API using the portal

Test the LLM API

Related content

Feedback

Additional resources