Collection policy reference

2025-06-30

Microsoft Purview collection policies have many components to configure. To create an effective policy, you need to understand what the purpose of each component is and how its configuration alters the behavior of the policy. This article provides a detailed anatomy of a collection policy.

Before you begin

If you're new to collection policies, here's a list of the core articles you need as you implement them in your organization:

Collection Policies solution overview (preview)
Collection policy reference (preview) - this article that you're reading now introduces all the components of a DLP policy and how each one influences the behavior of a policy
Create and Deploy collection policies (preview).

Conditions

Specify conditions to define what data to detect. Conditions are optional, however some may be required for additional settings. If you don't add conditions, what gets detected depends on the data sources you select later:

Devices: All data is detected, even if it doesn't match your organization's classifiers
All other data sources: Only data that matches your organization's classifiers is detected.

Collection policies support four conditions:

Condition	More information
Content contains classifiers	Sensitive information types and trainable classifiers to detect. Can be scoped to all classifiers, all classifiers except selected ones, or specific classifiers. NOTE: The devices data source doesn't support trainable classifiers. Any selected trainable classifiers will be ignored by devices.
Document size equals or is greater than	Detect files with a size that is greater than a specified number of bytes, kilobytes (KB), megabytes (MB), gigabytes (GB), or terabytes (TB). This condition only applies to the devices data source.
Document is equal to or smaller than	Detect files with a size that is smaller than a specified number of bytes, kilobytes (KB), megabytes (MB), gigabytes (GB), or terabytes (TB). This condition only applies to the devices data source.
File extension is	Detect files with specified file extensions. This condition only applies to the devices data source.

Activities

Choose which activities to detect. Supported activities are specific to the data sources you want to include.

Tip

You can mix activities that support different data sources in a single policy, but you must add all applicable data sources to the policy to support the selected activities.

Activity	Description	Data source
Text sent to or shared with cloud or AI app	When raw text is uploaded to a cloud app, including generative AI prompts, form submissions, and messages	- Cloud apps - Generative AI
File uploaded to or shared with cloud or AI app	When a binary file is uploaded to a cloud app or generative AI services	- Cloud apps - Generative AI
Text received from cloud or AI app	When raw text is downloaded from a cloud app, including generative AI responses	- Cloud apps - Generative AI
File downloaded from cloud or AI app	When a binary file is downloaded from a cloud app or generative AI service	- Cloud apps - Generative AI
Archive created	When an archive file is created on an onboarded endpoint device	Devices
File accessed by unallowed app	When a file is accessed by a restricted app or app group on an onboarded endpoint device	Devices
File archived	When a file is added to an archive on an onboarded endpoint device	Devices
File copied to network share	When a file is copied to a network share on an onboarded endpoint device	Devices
File copied to remote desktop session	When a file is copied to a remote computer through a remote desktop session on an onboarded endpoint device	Devices
File copied to removable media	When a file is copied to a removable media, such as a USB flash drive, on an onboarded endpoint device	Devices
File created	When a file is created on an onboarded endpoint device	Devices
File created on network share	When a file is created on a network share from an onboarded endpoint device	Devices
File created on removable media	When a file is created on removable media, such as a USB flash drive, from an onboarded endpoint device	Devices
File deleted	When a file deleted from an onboarded endpoint device	Devices
File modified	When a file is modified from an onboarded endpoint device	Devices
File printed	When a file printed from an onboarded endpoint device	Devices
File read	When a file is read from an onboarded endpoint device	Devices
File renamed	When a file is renamed from an onboarded endpoint device	Devices
File transferred by Bluetooth	When a file is transferred by Bluetooth from an onboarded endpoint device	Devices
File uploaded to cloud	When a file is uploaded to the cloud from an onboarded endpoint device	Devices
Removable media mount	When removable media, such as a USB flash drive, is mounted on an onboarded endpoint device	Devices
Removable media unmount	When removable media, such as a USB flash drive, is unmounted on an onboarded endpoint device	Devices

Data sources

Data sources define where to apply the policy, and are directly correlated to the activities added to the policy.

The following data sources are supported:

Data source	More information	Supported activities
Devices (preview)	Devices onboarded to Microsoft 365 and managed by your org.	Windows devices onboarded into Microsoft 365.
Copilot experiences (preview)	Includes Copilot in Microsoft Fabric and Microsoft Security Copilot only, with support for more experiences coming soon.	- Text sent to or shared with cloud or AI app - Text received from cloud or AI app
Enterprise AI (preview)	Non-Copilot AI apps that are onboarded or connected to your org using methods like Microsoft Entra registration, Azure AI services, or Purview Data Map connectors.	- Text sent to or shared with cloud or AI app - Text received from cloud or AI app
Unmanaged cloud apps (preview)	Cloud apps sourced in the Defender for Cloud Apps catalog which aren't set up for single sign-on (SSO), allowing users to access personal data through a browser, app, add-in, or API. Policies will only detect data while its being shared or transferred (data in motion) via browser and network detection.	Browser & Network: - Text sent to or shared with cloud or AI app Network only: - Text received from cloud or AI app - File uploaded to or shared with cloud or AI app -File downloaded from cloud or AI app
Adaptive app scopes (preview)	Groups of apps, whose membership is determined based on app metadata, such as category. Currently only "All unmanaged AI apps" - all unmanaged cloud apps categorized as generative AI - is supported via browser and network detection.	Browser & Network: - Text sent to or shared with cloud or AI app Network only: - Text received from cloud or AI app - File uploaded to or shared with cloud or AI app -File downloaded from cloud or AI app

Scoping data sources to users and groups

For each data source, you can choose to scope to the following:

All users and groups (default)
Specific users and groups
All except specific users and groups

Note

Excluded users and groups take precedence over any included users or groups.

Other collection policy settings

Depending on the conditions, activities, and data sources specified, there may be other collection policy settings to configure. Whenever these settings are disabled or grayed-out, it means the policy configuration wasn't compatible with the setting.

Content capture for AI interactions

To help comply with regulatory requirements, you can decide whether to capture and store all detected prompts and responses from any generative AI data sources added to the policy. This makes it easy to discover and protect the captured content later with other Microsoft Purview policies and solutions. This capability doesn't include content in files shared with generative AI, and only applies to the following data sources:

Copilot experiences
Enterprise AI
Unmanaged cloud apps categorized as generative AI
All unmanaged AI apps adaptive app scope

Without this setting enabled, content detected in prompts and responses are limited to sensitive information only.

Note

To capture AI content, you must have the Content contains classifiers condition set to All.

Cloud apps detection

If any unmanaged cloud app or adaptive app scopes data sources have been added to the policy, you must choose how to detect this data. You can choose:

Browser - Detect sensitive data shared with unmanaged cloud apps through the Microsoft Edge browser when on a managed work device. Currently only applies to the following AI apps: ChatGPT, DeepSeek, Google Gemini, and Microsoft Copilot. See supported browsers to confirm your version of the Microsoft Edge browser supports browser detection.
Network - Detect sensitive data shared with unmanaged cloud apps through browsers, apps, APIs, and more, with an integrated Secure Service Edge (SSE) provider and Purview network data security.

Next steps

After creating a collection policy there may be required next steps to take depending on the configured settings.

If Browser detection is enabled, you must use the Microsoft Edge management service to ensure users included in the policy can’t share data to cloud apps in other browsers, like Chrome or Firefox. See Activate your DLP policy in Microsoft Edge.
If Network detection is enabled, you must add and configure one or more Secure Access Service Edge (SASE) or Secure Service Edge (SSE) integrations in DLP settings to begin detecting network traffic. See SASE provider integrations.

Pay-as-you-go features

The following collection policy data sources and features are pay-as-you-go and require an Azure subscription to be linked before creating a policy. Learn more about pay-as-you-go billing.

Copilot experiences
Enterprise AI
Unmanaged cloud app activity detected through Purview network data security

Privacy notice for Enterprise AI and Network Data Security

Enterprise AI data sources and network data security integrations might require integration with a third-party app or provider. It's important to note, if you choose to enable any third-party integration, they'll have access to and may store some policy configuration, including user identifiers. In this case, the third-party's terms, conditions, and privacy policy will govern the usage and storage of this data.