Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
This feature is currently in public preview. This preview is provided without a service-level agreement and isn't recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
This article describes an approach or pattern for building a solution that uses Azure AI Search for knowledge retrieval, and how to integrate knowledge retrieval into a custom solution that includes Azure AI Agent.
This article supports the agentic-retrieval-pipeline-example Python sample on GitHub.
This exercise differs from the Agentic Retrieval Quickstart in how it uses Azure AI Agent to retrieve data from the index, and how it uses an agent tool for orchestration. If you want to understand the retrieval pipeline in its simplest form, begin with the quickstart.
Prerequisites
The following resources are required for this design pattern:
Azure AI Search, Basic pricing tier or higher, in a region that provides semantic ranking.
A search index that satisfies the index criteria for agentic retrieval.
A project in Azure AI Foundry, with an Azure AI Agent in a Basic setup.
Follow the steps in Create a project for Azure AI Foundry. Creating the project also creates the Azure AI Foundry resource in your Azure subscription.
Azure OpenAI with a deployment of one of the chat completion models listed below. We recommend a minimum of 100,000 token capacity for your model. You can find capacity and the rate limit in the model deployments list in the Azure AI Foundry portal. You can also deploy text embedding models if you want vectorization at query time.
Supported large language models
Use one of the following chat completion models with your AI agent:
gpt-4o
gpt-4o-mini
gpt-4.1
gpt-4.1-nano
gpt-4.1-mini
Package version requirements
Use a package version that provides preview functionality. See the requirements.txt
file for more packages used in the example solution.
azure-ai-projects==1.0.0b11
azure-ai-agents==1.0.0
azure-search-documents==11.6.0b12
Configure access
Before you begin, make sure you have permissions to access content and operations. We recommend Microsoft Entra ID authentication and role-based access for authorization. You must be an Owner or User Access Administrator to assign roles. If roles aren't feasible, you can use key-based authentication instead.
Configure access to each resource identified in this section.
Azure AI Search provides the agentic retrieval pipeline. Configure access for yourself, your app, and your search service for downstream access to models.
-
For local testing, you must have Search Service Contributor, Search Index Data Contributor, and Search Index Data Reader role assignments to create, load, and retrieve on Azure AI Search.
For integrated operations, ensure that all clients using the retrieval pipeline (agent and tool) have Search Index Data Reader role assignments for sending retrieval requests.
Development tasks
Development tasks on the Azure AI Search side include:
- Create a knowledge agent on Azure AI Search that maps to your deployed model in Azure AI Foundry Model.
- Call the retriever and provide a query, conversation, and override parameters.
- Parse the response for the parts you want to include in your chat application. For many scenarios, just the content portion of the response is sufficient.
Components of the solution
Your custom application makes API calls to Azure AI Search and an Azure SDK.
- External data from anywhere, although we recommend data sources used for integrated indexing.
- Azure AI Search, hosting indexed data and the agentic data retrieval engine.
- Azure AI Foundry, hosting the AI agent and tool.
- Azure SDK with a Foundry project, providing programmatic access to Azure AI Foundry.
- Azure OpenAI, hosting a chat completion model used by the knowledge agent and any embedding models used by vectorizers for vector search.
Set up your environment
The canonical use case for agentic retrieval is through the Azure AI Agent service. We recommend it because it's the easiest way to create a chatbot.
An agent-to-agent solution combines Azure AI Search with Foundry projects that you use to build custom agents. An agent simplifies development by tracking conversation history and calling other tools.
You need endpoints for:
- Azure AI Search
- Azure OpenAI
- Azure AI Foundry project
You can find endpoints for Azure AI Search and Azure OpenAI in the Azure portal, in the Overview pages for each resource.
You can find the project endpoint in the Azure AI Foundry portal:
Sign in to the Azure AI Foundry portal and open your project.
In the Overview tile, find and copy the Azure AI Foundry project endpoint.
A hypothetical endpoint might look like this:
https://your-foundry-resource.services.ai.azure.com/api/projects/your-foundry-project
If you don't have an Azure OpenAI resource in your Foundry project, revisit the model deployment prerequisite. A connection to the resource is created when you deploy a model.
Set up an AI project client and create an agent
Use AIProjectClient to create your AI agent.
from azure.ai.projects import AIProjectClient
project_client = AIProjectClient(endpoint=project_endpoint, credential=credential)
list(project_client.agents.list_agents())
Your agent is backed by a supported language model and instructions inform the agent of its scope.
instructions = """
A Q&A agent that can answer questions about the Earth at night.
Sources have a JSON format with a ref_id that must be cited in the answer using the format [ref_id].
If you do not have the answer, respond with "I don't know".
"""
agent = project_client.agents.create_agent(
model=agent_model,
name=agent_name,
instructions=instructions
)
print(f"AI agent '{agent_name}' created or updated successfully")
Add an agentic retrieval tool to AI Agent
An end-to-end pipeline needs an orchestration mechanism for coordinating calls to the retriever and knowledge agent. You can use a tool for this task. The tool calls the Azure AI Search knowledge retrieval client and the Azure AI agent, and it drives the conversations with the user.
from azure.ai.agents.models import FunctionTool, ToolSet, ListSortOrder
from azure.search.documents.agent import KnowledgeAgentRetrievalClient
from azure.search.documents.agent.models import KnowledgeAgentRetrievalRequest, KnowledgeAgentMessage, KnowledgeAgentMessageTextContent, KnowledgeAgentIndexParams
agent_client = KnowledgeAgentRetrievalClient(endpoint=endpoint, agent_name=agent_name, credential=credential)
thread = project_client.agents.threads.create()
retrieval_results = {}
# AGENTIC RETRIEVAL DEFINITION DEFERRED TO NEXT SECTION
functions = FunctionTool({ agentic_retrieval })
toolset = ToolSet()
toolset.add(functions)
project_client.agents.enable_auto_function_calls(toolset)
How to structure messages
The messages sent to the agent tool include instructions for chat history and using the results obtained from knowledge retrieval on Azure AI Search. The response is passed as a large single string with no serialization or structure.
def agentic_retrieval() -> str:
"""
Searches a NASA e-book about images of Earth at night and other science related facts.
The returned string is in a JSON format that contains the reference id.
Be sure to use the same format in your agent's response
"""
# Take the last 5 messages in the conversation
messages = project_client.agents.list_messages(thread.id, limit=5, order=ListSortOrder.DESCENDING)
# Reverse the order so the most recent message is last
messages.data.reverse()
retrieval_result = retrieval_result = agent_client.retrieve(
retrieval_request=KnowledgeAgentRetrievalRequest(
messages=[KnowledgeAgentMessage(role=msg["role"], content=[KnowledgeAgentMessageTextContent(text=msg.content[0].text)]) for msg in messages.data],
target_index_params=[KnowledgeAgentIndexParams(index_name=index_name, reranker_threshold=2.5)]
)
)
# Associate the retrieval results with the last message in the conversation
last_message = messages.data[-1]
retrieval_results[last_message.id] = retrieval_result
# Return the grounding response to the agent
return retrieval_result.response[0].content[0].text
How to improve data quality
Search results are consolidated into a large unified string that you can pass to a chat completion model for a grounded answer. The following indexing and relevance tuning features in Azure AI Search are available to help you generate high quality results. You can implement these features in the search index, and the improvements in search relevance are evident in the quality of the response returned during retrieval.
Scoring profiles (added to your search index) provide built-in boosting criteria. Your index must specify a default scoring profile, and that's the one used by the retrieval engine when queries include fields associated with that profile.
Semantic configuration is required, but you determine which fields are prioritized and used for ranking.
For plain text content, you can use analyzers to control tokenization during indexing.
For multimodal or image content, you can use image verbalization for LLM-generated descriptions of your images, or classic OCR and image analysis via skillsets during indexing.
Control the number of subqueries
The LLM determines the quantity of subqueries based on these factors:
- User query
- Chat history
- Semantic ranker input constraints
As the developer, the best way to control the number of subqueries is by setting the defaultMaxDocsForReranker
in either the knowledge agent definition or as an override on the retrieve action.
The semantic ranker processes up to 50 documents as an input, and the system creates subqueries to accommodate all of the inputs to semantic ranker. For example, if you only wanted two subqueries, you could set defaultMaxDocsForReranker
to 100 to accommodate all documents in two batches.
The semantic configuration in the index determines whether the input is 50 or not. If the value is less, the query plan specifies however many subqueries are necessary to meet the defaultMaxDocsForReranker
threshold.
Control the number of threads in chat history
A knowledge agent object in Azure AI Search acquires chat history through API calls to the Azure Evaluations SDK, which maintains the thread history. You can filter this list to get a subset of the messages, for example, the last five conversation turns.
Control costs and limit operations
Look at output tokens in the activity array for insights into the query plan.
Tips for improving performance
Summarize message threads.
Use
gpt mini
or a smaller model that performs faster.Set
maxOutputSize
in the knowledge agent to govern the size of the response, ormaxRuntimeInSeconds
for time-bound processing.