Edit

Share via


Import an OpenAI-compatible language model API

APPLIES TO: All API Management tiers

You can import OpenAI-compatible language model endpoints to your API Management instance as APIs. For example, you might want to manage an LLM that you self-host, or that's hosted on an inference provider other than Azure AI services. Use AI gateway policies and other capabilities in API Management to simplify integration, improve observability, and enhance control over the model endpoints.

Learn more about managing AI APIs in API Management:

Language model API types

API Management supports two types of language model APIs for this scenario. Choose the option suitable for your model deployment. The option determines how clients call the API and how the API Management instance routes requests to the AI service.

  • OpenAI-compatible - Language model endpoints that are compatible with OpenAI's API. Examples include certain models exposed by inference providers such as Hugging Face Text Generation Inference (TGI).

    API Management configures an OpenAI-compatible chat completions endpoint.

  • Passthrough - Other language model endpoints that aren't compatible with OpenAI's API. Examples include models deployed in Amazon Bedrock or other providers.

    API Management configures wildcard operations for common HTTP verbs. Clients can append paths to the wildcard operations, and API Management passes requests to the backend.

Prerequisites

Import language model API using the portal

To import a language model API to API Management:

  1. In the Azure portal, navigate to your API Management instance.

  2. In the left menu, under APIs, select APIs > + Add API.

  3. Under Define a new API, select Language Model API.

    Screenshot of creating an OpenAI-compatible API in the portal.

  4. On the Configure API tab:

    1. Enter a Display name and optional Description for the API.
    2. Enter the URL to the LLM API endpoint.
    3. Optionally select one or more Products to associate with the API.
    4. In Path, append a path that your API Management instance uses to access the LLM API endpoints.
    5. In Type, select either Create OpenAI API or Create a passthrough API. See Language model API types for more information.
    6. In Access key, enter the authorization header name and API key used to access the LLM API, if required.
    7. Select Next.

    Screenshot of language model API configuration in the portal.

  5. On the Manage token consumption tab, optionally enter settings or accept defaults that define the following policies to help monitor and manage the API:

  6. On the Apply semantic caching tab, optionally enter settings or accept defaults that define the policies to help optimize performance and reduce latency for the API:

  7. On the AI content safety, optionally enter settings or accept defaults to configure the Azure AI Content Safety service to block prompts with unsafe content:

  8. Select Review.

  9. After settings are validated, select Create.

Test the LLM API

To ensure that your LLM API is working as expected, test it in the API Management test console.

  1. Select the API you created in the previous step.

  2. Select the Test tab.

  3. Select an operation that's compatible with the model deployment. The page displays fields for parameters and headers.

  4. Enter parameters and headers as needed. Depending on the operation, you might need to configure or update a Request body.

    Note

    In the test console, API Management automatically populates an Ocp-Apim-Subscription-Key header, and configures the subscription key of the built-in all-access subscription. This key enables access to every API in the API Management instance. Optionally display the Ocp-Apim-Subscription-Key header by selecting the "eye" icon next to the HTTP Request.

  5. Select Send.

    When the test is successful, the backend responds with a successful HTTP response code and some data. Appended to the response is token usage data to help you monitor and manage your language model token consumption.