Deploy a model to an endpoint

Completed

When you develop a generative AI app, you need to integrate language models into your application. To be able to use a language model, you need to deploy the model. Let's explore how to deploy language models in the Azure AI Foundry, after first understanding why to deploy a model.

Why deploy a model?

You train a model to generate output based on some input. To get value out of your model, you need a solution that allows you to send input to the model, which the model processes, after which the output is visualized for you.

With generative AI apps, the most common type of solution is a chat application that expects a user question, which the model processes, to generate an adequate response. The response is then visualized to the user as a response to their question.

Diagram of user question being processed by model deployed to endpoint.

You can integrate a language model with a chat application by deploying the model to an endpoint. An endpoint is a specific URL where a deployed model or service can be accessed. Each model deployment typically has its own unique endpoint, which allows different applications to communicate with the model through an API (Application Programming Interface).

When a user asks a question:

  1. An API request is sent to the endpoint.
  2. The endpoint specifies the model that processes the request.
  3. The result is sent back to the app through an API response.

When you deploy a language model from the model catalog with the Azure AI Foundry, you get an endpoint, which consists of a target URI (Uniform Resource Identifier) and a unique key. For example, a target URI for a deployed GPT-3.5 model can be:

https://ai-aihubdevdemo.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-03-15-preview

The URI includes:

  • Your AI hub name, for example ai-aihubdevdemo.
  • Your deployed model name, for example gpt-35-turbo.
  • The task for the model, for example chat/completion.

To protect your deployed models, each deployment comes with a key. You're only authorized to send and receive requests to and from the target URI, if you also provide the key to authenticate.

Now that you understand why you want to deploy a model, let's explore the deployment options with Azure AI Foundry.

Deploy a language model with Azure AI Foundry

When you deploy a language model with Azure AI Foundry, you have several types available, which depend on the model you want to deploy.

You can deploy:

The associated cost depends on the type of model you deploy, which deployment option you choose, and what you are doing with the model:

Azure OpenAI Service Azure AI Foundry Models Serverless compute Managed compute
Supported models Azure OpenAI models Flagship models (including Azure OpenAI models and Models-as-a-service models) Models-as-a-service models Open and custom models
Hosting service Azure OpenAI resource Azure AI Services resource AI Project resource AI Project resource
Deployment cost - - Minimal endpoint cost Charged per minute
Inferencing cost Token-based billing Token-based billing Token-based billing -