Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article shows how to set up a data agent in Microsoft Fabric using a lakehouse as the example data source. We first create and populate a lakehouse, then create a Fabric data agent and add the lakehouse to it. If you already have a Power BI semantic model (with the necessary read/write permissions), a warehouse, or a KQL database, follow the same steps and select that source instead. Although this walkthrough uses a lakehouse, the pattern is the same for other sources—only the data source selection differs.
Important
This feature is in preview.
Prerequisites
- A paid F2 or higher Fabric capacity, or a Power BI Premium per capacity (P1 or higher) capacity with Microsoft Fabric enabled
- Fabric data agent tenant settings is enabled.
- Cross-geo processing for AI is enabled.
- Cross-geo storing for AI is enabled.
- At least one of these: A warehouse, a lakehouse, one or more Power BI semantic models, or a KQL database with data.
- Power BI semantic models via XMLA endpoints tenant switch is enabled for Power BI semantic model data sources.
Important
Ensure the standalone Copilot experience is enabled in the Power BI admin portal (Tenant settings > Copilot > Standalone Copilot experience). If it isn't enabled, you won't be able to use the data agent inside Copilot scenarios even if other Copilot tenant switches are on. For details, see Copilot in Power BI tenant settings.
Create a lakehouse with AdventureWorksLH
First, create a lakehouse and populate it with the necessary data.
If you already have an instance of AdventureWorksLH in a lakehouse (or a warehouse), you can skip this step. If not, you can use the following instructions from a Fabric notebook to populate the lakehouse with the data.
Create a new notebook in the workspace where you want to create your Fabric data agent.
On the left side of the Explorer pane, select + Data sources. This option allows you to add an existing lakehouse or creates a new lakehouse. For sake of clarity, create a new lakehouse and assign a name to it.
In the top cell, add the following code snippet:
import pandas as pd from tqdm.auto import tqdm base = "https://synapseaisolutionsa.z13.web.core.windows.net/data/AdventureWorks" # load list of tables df_tables = pd.read_csv(f"{base}/adventureworks.csv", names=["table"]) for table in (pbar := tqdm(df_tables['table'].values)): pbar.set_description(f"Uploading {table} to lakehouse") # download df = pd.read_parquet(f"{base}/{table}.parquet") # save as lakehouse table spark.createDataFrame(df).write.mode('overwrite').saveAsTable(table)
Select Run all.
After a few minutes, the lakehouse populates with the necessary data.
Caution
Notebooks that continue running (for example, due to accidental infinite loops or constant polling) can consume Fabric capacity indefinitely. After the data finishes loading, stop any active cells and end the notebook session (Notebook toolbar > Stop session) if you no longer need it. Avoid adding long-running loops without a timeout.
Create a Fabric data agent
To create a new Fabric data agent, navigate to your workspace and select the + New Item button, as shown in this screenshot:
In the All items tab, search for Fabric data agent to locate the appropriate option. Once selected, a prompt asks you to provide a name for your Fabric data agent, as shown in this screenshot:
After you enter the name, proceed with the following steps to align the Fabric data agent with your specific requirements.
Select the data
Select the lakehouse you created in the previous step, and then select Add, as shown in the following screenshot:
Once the lakehouse is added as a data source, the Explorer pane on the left side of the Fabric data agent page shows the lakehouse name. Select the lakehouse to view all available tables. Use the checkboxes to select the tables you want to make available to the AI. For this scenario, select these tables:
dimcustomer
dimdate
dimgeography
dimproduct
dimproductcategory
dimpromotion
dimreseller
dimsalesterritory
factinternetsales
factresellersales
Provide instructions
To add instructions, select the Data agent instructions button to open the instructions pane on the right. You can add the following instructions.
The AdventureWorksLH
data source contains information from three tables:
dimcustomer
, for detailed customer demographics and contact informationdimdate
, for date-related data - for example, calendar and fiscal informationdimgeography
, for geographical details including city names and country region codes.
Use this data source for queries and analyses that involve customer details, time-based events, and geographical locations.
Provide examples
To add example queries, select the Example queries button to open the example queries pane on the right. This pane provides options to add or edit example queries for all supported data sources. For each data source, you can select Add or Edit Example Queries to input the relevant examples, as shown in the following screenshot:
Here, you should add Example queries for the lakehouse data source that you created.
Question: Calculate the average percentage increase in sales amount for repeat purchases for every zipcode. Repeat purchase is a purchase subsequent to the first purchase (the average should always be computed relative to the first purchase)
SELECT AVG((s.SalesAmount - first_purchase.SalesAmount) / first_purchase.SalesAmount * 100) AS AvgPercentageIncrease
FROM factinternetsales s
INNER JOIN dimcustomer c ON s.CustomerKey = c.CustomerKey
INNER JOIN dimgeography g ON c.GeographyKey = g.GeographyKey
INNER JOIN (
SELECT *
FROM (
SELECT
CustomerKey,
SalesAmount,
OrderDate,
ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY OrderDate) AS RowNumber
FROM factinternetsales
) AS t
WHERE RowNumber = 1
) first_purchase ON s.CustomerKey = first_purchase.CustomerKey
WHERE s.OrderDate > first_purchase.OrderDate
GROUP BY g.PostalCode;
Question: Show the monthly total and year-to-date total sales. Order by year and month.
SELECT
Year,
Month,
MonthlySales,
SUM(MonthlySales) OVER (PARTITION BY Year ORDER BY Year, Month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeTotal
FROM (
SELECT
YEAR(OrderDate) AS Year,
MONTH(OrderDate) AS Month,
SUM(SalesAmount) AS MonthlySales
FROM factinternetsales
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
) AS t
Note
Adding sample query/question pairs isn't currently supported for Power BI semantic model data sources.
Test and revise the Fabric data agent
Now that you configured the Fabric data agent, added Fabric data agent instructions, and provided example queries for the lakehouse, you can interact with it by asking questions and receiving answers. As you continue testing, you can add more examples, and refine the instructions, to further improve the performance of the Fabric data agent. Collaborate with your colleagues to gather feedback, and based on their input, ensure the provided example queries and instructions align with the types of questions they want to ask.
Publish the Fabric data agent
After you validate the performance of the Fabric data agent, you might decide to publish it so you can then share it with your colleagues who want to do Q&A over data. In this case, select Publish, as shown in this screenshot:
The Publish data agent box opens, as shown in this screenshot:
In this box, select Publish to publish the Fabric data agent. The published URL for the Fabric data agent appears, as shown in this screenshot:
Use the Fabric data agent in Copilot in Power BI
You can use the Copilot in Power BI to interact with the Fabric data agent after you publish it. With Copilot in Power BI, you can directly consume the data agent and other items (for example, reports, or semantic models) without needing to switch between them.
Select the Copilot button on the left navigation pane, to open the Copilot in Power BI. Next, select Add items for better results in the text box at the bottom, to add the data agent. Select Data agents in the window that opens. You can only see the data agents that you have permission to access. Choose the data agent you want and select Confirm. This example shows how to work with a single data agent, but you can add more items - for example, other data agents, reports, or semantic models. The following screenshot illustrates the steps with a single data agent:
Now that you added the data agent to the Copilot in Power BI, you can ask any questions related to your Fabric data agent, as shown in the following screenshot:
Use the Fabric data agent programmatically
You can use the Fabric data agent programmatically within a Fabric notebook. To determine whether or not the Fabric data agent has a published URL value, select Settings, as shown in the following screenshot:
Before you publish the Fabric data agent, it doesn't have a published URL value, as shown in the following screenshot:
If you haven't published the Fabric data agent before, you can publish it following the instructions in the previous steps. You can then copy the published URL and use it in the Fabric notebook. This way, you can query the Fabric data agent by making calls to the Fabric data agent API in a Fabric notebook. Paste the copied URL in this code snippet. Then, replace the question with any query relevant to your Fabric data agent. This example uses \<generic published URL value\>
as the URL.
Important
When calling a data agent programmatically, implement:
- A polling timeout (see example below) to avoid indefinite loops.
- Minimal polling frequency (start at 2–5 seconds; increase only if needed).
- Cleanup of created threads or resources after completion.
- Notebook session shutdown when finished to release Fabric capacity.
Note
Adjust version pins (openai
, synapseml
, pandas
, tqdm
) to the latest validated versions for your Fabric runtime if these exact versions become outdated.
%pip install "openai==1.70.0"
%pip install "synapseml==1.0.5" # Required for synapse.ml.mlflow (update version as needed)
%pip install pandas tqdm # Skip if already available in the Fabric runtime
import typing as t
import time
import uuid
# OpenAI SDK internals
from openai import OpenAI
from openai._models import FinalRequestOptions
from openai._types import Omit
from openai._utils import is_given
# SynapseML helper for env config
from synapse.ml.mlflow import get_mlflow_env_config
# Removed unused imports: requests, json, pprint, APIStatusError, SynapseTokenProvider
base_url = "https://<generic published base URL value>"
question = "What data sources do you have access to?"
configs = get_mlflow_env_config()
# Create OpenAI Client
class FabricOpenAI(OpenAI):
def __init__(
self,
api_version: str ="2024-05-01-preview",
**kwargs: t.Any,
) -> None:
self.api_version = api_version
default_query = kwargs.pop("default_query", {})
default_query["api-version"] = self.api_version
super().__init__(
api_key="",
base_url=base_url,
default_query=default_query,
**kwargs,
)
def _prepare_options(self, options: FinalRequestOptions) -> None:
headers: dict[str, str | Omit] = (
{**options.headers} if is_given(options.headers) else {}
)
options.headers = headers
headers["Authorization"] = f"Bearer {configs.driver_aad_token}"
if "Accept" not in headers:
headers["Accept"] = "application/json"
if "ActivityId" not in headers:
correlation_id = str(uuid.uuid4())
headers["ActivityId"] = correlation_id
return super()._prepare_options(options)
# Pretty printing helper
def pretty_print(messages):
print("---Conversation---")
for m in messages:
print(f"{m.role}: {m.content[0].text.value}")
print()
fabric_client = FabricOpenAI()
# Create assistant
assistant = fabric_client.beta.assistants.create(model="not used")
# Create thread
thread = fabric_client.beta.threads.create()
# Create message on thread
message = fabric_client.beta.threads.messages.create(thread_id=thread.id, role="user", content=question)
# Create run
run = fabric_client.beta.threads.runs.create(thread_id=thread.id, assistant_id=assistant.id)
# Wait for run to complete (avoid indefinite loop)
terminal_states = {"completed", "failed", "cancelled", "requires_action"}
poll_interval = 2
timeout_seconds = 300 # Adjust based on expected workload
start_time = time.time()
while run.status not in terminal_states:
if time.time() - start_time > timeout_seconds:
raise TimeoutError(f"Run polling exceeded {timeout_seconds} seconds (last status={run.status})")
run = fabric_client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id,
)
print(run.status)
time.sleep(poll_interval)
if run.status != "completed":
print(f"Run finished with status: {run.status}")
# Print messages
response = fabric_client.beta.threads.messages.list(thread_id=thread.id, order="asc")
pretty_print(response)
# Delete thread
fabric_client.beta.threads.delete(thread_id=thread.id)