Share via


Create an external ___location to connect cloud storage to Azure Databricks

This article describes how to configure an external ___location object in Unity Catalog to control access to cloud storage from Azure Databricks.

Overview of external locations

External locations associate storage credentials with cloud object storage containers. External locations are used to define:

External locations can reference storage in an Azure Data Lake Storage storage container, AWS S3 bucket, or Cloudflare R2 bucket.

The diagram below represents the filesystem hierarchy of a single cloud storage bucket or container, with four external locations that share one storage credential.

External locations

Overview of external ___location creation

You can use any of the following interfaces to create an external ___location:

  1. Catalog Explorer

    This option provides a graphical UI. You can use Catalog Explorer to create external locations that reference: Azure Data Lake Storage containers, S3 buckets (read-only), Cloudflare R2 buckets, and DBFS root (legacy)

  2. SQL commands in a notebook or Databricks SQL query

  3. The Databricks CLI

  4. Terraform

This article covers options 1 and 2.

Note

Storing data in the DBFS root storage ___location is a legacy practice, and Databricks recommends against it. However, if your workspace does store data in DBFS root, you can create an external ___location to govern access to that data using Unity Catalog. For details, see Create an external ___location for data in DBFS root (legacy).

For more information about the uses of external locations and the relationship between storage credentials and external locations, see Connect to cloud object storage using Unity Catalog.

Before you begin

Prerequisites:

  • You must create the Azure Data Lake Storage storage container, AWS S3 bucket, or Cloudflare R2 bucket that you want to use as an external ___location before you create the external ___location object in Azure Databricks.

    • Azure Data Lake Storage storage accounts that you use as external locations must have a hierarchical namespace.
    • An S3 bucket name cannot use dot notation (for example, incorrect.bucket.name.notation). For more bucket naming guidance, see the AWS bucket naming rules.

    • The bucket cannot have an S3 access control list attached to it.

Permissions requirements:

  • You must have the CREATE EXTERNAL LOCATION privilege on both the metastore and the storage credential referenced in the external ___location. Metastore admins have CREATE EXTERNAL LOCATION on the metastore by default.
  • If you are creating an external ___location for the DBFS root storage ___location, the system can create the storage credential for you, but you must be a workspace admin. For details, see Create an external ___location for data in DBFS root (legacy)

Option 1: Create an external ___location using Catalog Explorer

You can create an external ___location manually using Catalog Explorer.

Permissions and prerequisites: see Before you begin.

To create the external ___location:

  1. Log in to a workspace that is attached to the metastore.

  2. In the sidebar, click Data icon. Catalog.

  3. On the Quick access page, click the External data > button, go to the External Locations tab, and click Create ___location.

  4. Enter an External ___location name.

  5. Select the Storage type: Azure Data Lake Storage, S3 (Read-only), R2, or DBFS Root.

    Storing data in DBFS root is an unrecommended, legacy practice. For details, see Create an external ___location for data in DBFS root (legacy).

  6. Under URL, enter or select the path to the external ___location.

    For Azure Data Lake Storage, S3, and R2, you have the following options:

    • To copy the container path from an existing DBFS mount point, click Copy from DBFS.

    • If you aren’t copying from an existing mount point, use the URL field to enter the container or bucket path that you want to use as the external ___location.

      For example, abfss://my-container-name@my-storage-account.dfs.core.windows.net/<path> or r2://my-bucket@my-account-id.r2.cloudflarestorage.com/<path>.

    For DBFS root:

    • The system populates the subpath to the DBFS root storage ___location. If you are a workspace admin, the system also creates the storage credential for you.

    See Create an external ___location for data in DBFS root (legacy).

  7. Select the storage credential that grants access to the external ___location.

    Note

    If your external ___location is for the DBFS root and you are a workspace admin, the system creates the storage credential for you, and you do not need to select one.

    If you don't have a storage credential, you can create one:

    1. In the Storage credential drop-down list, select + Create new storage credential.

    2. The credential information that you enter depends on the storage type:

      For Azure Data Lake Storage, enter the access connector ID and (optionally) the user-assigned managed identity that give access to the storage ___location. See Create a storage credential that accesses Azure Data Lake Storage

      For Cloudflare API tokens, enter the Cloudflare account, access key ID, and secret access key. See Create a storage credential for connecting to Cloudflare R2.

      For AWS S3, enter the IAM role ARN that gives access to the storage ___location. See Create a storage credential for connecting to AWS S3 (read-only).

  8. (Optional) If you want users to have read-only access to the external ___location, click Advanced Options and select Read only. For more information, see Mark an external ___location as read-only.

    External locations that reference AWS S3 paths are inherently read-only.

  9. (Optional) If the external ___location is intended for a Hive metastore federated catalog, click Advanced options and enable Fallback mode.

    See Enable fallback mode on external locations.

  10. (Optional, for AWS S3 locations only) If the S3 bucket requires SSE encryption, you can configure an encryption algorithm to allow external tables and volumes in Unity Catalog to access data in your S3 bucket.

    For instructions, see Configure an encryption algorithm on an external ___location (AWS S3 only).

  11. (Optional) To enable the ability to subscribe to change notifications on the external ___location, click Advanced Options and select Enable file events.

    For details, see (Recommended) Enable file events for an external ___location.

  12. Click Create.

  13. (Optional) Bind the external ___location to specific workspaces.

    By default, any privileged user can use the external ___location on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign an external ___location to specific workspaces.

  14. Go to the Permissions tab to grant permission to use the external ___location.

    For anyone to use the external ___location you must grant permissions:

    • To use the external ___location to add a managed storage ___location to metastore, catalog, or schema, grant the CREATE MANAGED LOCATION privilege.

    • To create external tables or volumes, grant CREATE EXTERNAL TABLE or CREATE EXTERNAL VOLUME.

    1. Click Grant.
    2. On the Grant on <external ___location> dialog, select users, groups, or service principals in Principals field, and select the privilege you want to grant.
    3. Click Grant.

Option 2: Create an external ___location using SQL

To create an external ___location using SQL, run the following command in a notebook or the SQL query editor. Replace the placeholder values. For required permissions and prerequisites, see Before you begin.

  • <___location-name>: A name for the external ___location. If location_name includes special characters, such as hyphens (-), it must be surrounded by backticks (` `). See Names.
  • <bucket-path>: The path in your cloud tenant that this external ___location grants access to. For example, abfss://my-container-name@my-storage-account.dfs.core.windows.net/<path> or r2://my-bucket@my-account-id.r2.cloudflarestorage.com/<path>.
  • <storage-credential-name>: The name of the storage credential that authorizes reading from and writing to the storage container or bucket path. If the storage credential name includes special characters, such as hyphens (-), it must be surrounded by backticks (` `).
CREATE EXTERNAL LOCATION [IF NOT EXISTS] `<___location-name>`
URL '<bucket-path>'
WITH ([STORAGE] CREDENTIAL `<storage-credential-name>`)
[COMMENT '<comment-string>'];

If you want to limit external ___location access to specific workspaces in your account, also known as workspace binding or external ___location isolation, see (Optional) Assign an external ___location to specific workspaces.

(Optional) Assign an external ___location to specific workspaces

By default, an external ___location is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as READ FILES) on that external ___location, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you might want to allow access to an external ___location only from specific workspaces. This feature is known as workspace binding or external ___location isolation.

Typical use cases for binding an external ___location to specific workspaces include:

  • Ensuring that data engineers who have the CREATE EXTERNAL TABLE privilege on an external ___location that contains production data can create external tables on that ___location only in a production workspace.
  • Ensuring that data engineers who have the READ FILES privilege on an external ___location that contains sensitive data can only use specific workspaces to access that data.

For more information about how to restrict other types of data access by workspace, see Limit catalog access to specific workspaces.

Important

Workspace bindings are referenced at the point when privileges against the external ___location are exercised. For example, if a user creates an external table by issuing the statement CREATE TABLE myCat.mySch.myTable LOCATION 'abfss://my-container-name@storage-account-name.dfs.core.windows.net/finance' from the myWorkspace workspace, the following workspace binding checks are performed in addition to regular user privilege checks:

  • Is the external ___location covering 'abfss://my-container-name@storage-account-name.dfs.core.windows.net/finance' bound to myWorkspace?
  • Is the catalog myCat bound to myWorkspace with access level Read & Write?

If the external ___location is subsequently unbound from myWorkspace, then the external table continues to function.

This feature also allows you to populate a catalog from a central workspace and make it available to other workspaces using catalog bindings, without also having to make the external ___location available in those other workspaces.

Bind an external ___location to one or more workspaces

To assign an external ___location to specific workspaces, you can use Catalog Explorer or the Databricks CLI.

Permissions required: Metastore admin, external ___location owner, or MANAGE on the external ___location.

Note

Metastore admins can see all external locations in a metastore using Catalog Explorer—and external ___location owners can see all external locations that they own in a metastore—regardless of whether the external ___location is assigned to the current workspace. External locations that are not assigned to the workspace appear grayed out.

Catalog Explorer

  1. Log in to a workspace that is linked to the metastore.

  2. In the sidebar, click Data icon. Catalog.

  3. On the Quick access page, click the External data > button to go to the External Locations tab.

  4. Select the external ___location and go to the Workspaces tab.

  5. On the Workspaces tab, clear the All workspaces have access checkbox.

    If your external ___location is already bound to one or more workspaces, this checkbox is already cleared.

  6. Click Assign to workspaces and enter or find the workspaces you want to assign.

To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.

CLI

There are two Databricks CLI command groups and two steps required to assign an external ___location to a workspace.

In the following examples, replace <profile-name> with the name of your Azure Databricks authentication configuration profile. It should include the value of a personal access token, in addition to the workspace instance name and workspace ID of the workspace where you generated the personal access token. See Azure Databricks personal access token authentication.

  1. Use the external-locations command group's update command to set the external ___location's isolation mode to ISOLATED:

    databricks external-locations update <my-___location> \
    --isolation-mode ISOLATED \
    --profile <profile-name>
    

    The default isolation-mode is OPEN to all workspaces attached to the metastore.

  2. Use the workspace-bindings command group's update-bindings command to assign the workspaces to the external ___location:

    databricks workspace-bindings update-bindings external-___location <my-___location> \
    --json '{
      "add": [{"workspace_id": <workspace-id>}...],
      "remove": [{"workspace_id": <workspace-id>}...]
    }' --profile <profile-name>
    

    Use the "add" and "remove" properties to add or remove workspace bindings.

    Note

    Read-only binding (BINDING_TYPE_READ_ONLY) is not available for external locations. Therefore there is no reason to set binding_type for the external locations binding.

To list all workspace assignments for an external ___location, use the workspace-bindings command group's get-bindings command:

databricks workspace-bindings get-bindings external-___location <my-___location> \
--profile <profile-name>

See also Workspace Bindings in the REST API reference.

Unbind an external ___location from a workspace

Instructions for revoking workspace access to an external ___location using Catalog Explorer or the workspace-bindings CLI command group are included in Bind an external ___location to one or more workspaces.

Next steps