Attach and manage a Synapse Spark pool in Azure Machine Learning

2024-08-28

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you'll learn how to attach a Synapse Spark Pool in Azure Machine Learning. You can attach a Synapse Spark Pool in Azure Machine Learning in one of these ways:

Using Azure Machine Learning studio UI
Using Azure Machine Learning CLI
Using Azure Machine Learning Python SDK

Prerequisites

An Azure subscription; if you don't have an Azure subscription, create a free account before you begin.
An Azure Machine Learning workspace. See Create workspace resources.
Create an Azure Synapse Analytics workspace in Azure portal.
Create an Apache Spark pool using the Azure portal.

Attach a Synapse Spark pool in Azure Machine Learning

Azure Machine Learning offers different ways to attach and manage a Synapse Spark pool.

To attach a Synapse Spark Pool with the Studio Compute tab:

In the Manage section of the left pane, select Compute.
Select Attached computes.
On the Attached computes screen, select New, to see the options for attaching different types of computes.
Select Synapse Spark pool.

The Attach Synapse Spark pool panel opens on the right side of the screen. In this panel:

Enter a Name, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning resource.
Select an Azure Subscription from the dropdown menu.
Select a Synapse workspace from the dropdown menu.
Select a Spark Pool from the dropdown menu.
Toggle the Assign a managed identity option, to enable it.
Select a managed Identity type to use with this attached Synapse Spark Pool.
Select Update, to complete the Synapse Spark Pool attach process.

APPLIES TO: Azure CLI ml extension v2 (current)

With the Azure Machine Learning CLI, we can use intuitive YAML syntax and commands from the command line interface, to attach and manage a Synapse Spark pool.

To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:

name – name of the attached Synapse Spark pool.
type – set this property to synapsespark.
resource_id – this property should provide the resource ID value of the Synapse Spark pool created in the Azure Synapse Analytics workspace. The Azure resource ID includes
- Azure Subscription ID,
- resource Group Name,
- Azure Synapse Analytics Workspace Name, and
- name of the Synapse Spark Pool.
```
name: <ATTACHED_SPARK_POOL_NAME>

type: synapsespark

resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>
```

identity – this property defines the identity type to assign to the attached Synapse Spark pool. It can take one of these values:

system_assigned

user_assigned

name: <ATTACHED_SPARK_POOL_NAME>

type: synapsespark

resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>

identity:
type: system_assigned

For the identity type user_assigned, you should also provide a list of user_assigned_identities values. Each user-assigned identity should be declared as an element of the list, by using the resource_id value of the user-assigned identity. The first user-assigned identity in the list is used to submit a job by default.

name: <ATTACHED_SPARK_POOL_NAME>

type: synapsespark

resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>

identity:
  type: user_assigned
  user_assigned_identities:
    - resource_id: /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>

The YAML files above can be used in the az ml compute attach command as the --file parameter. A Synapse Spark pool can be attached to an Azure Machine Learning workspace, in a specified resource group of a subscription, with the az ml compute attach command as shown here:

az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please visit https://aka.ms/azuremlexperimental for more information.

{
    "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
    },
    "created_on": "2022-09-13 19:01:05.109840+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "___location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
}

If the attached Synapse Spark pool, with the name specified in the YAML specification file, already exists in the workspace, then az ml compute attach command execution updates the existing pool with the information provided in the YAML specification file. You can update the

identity type
user assigned identities
tags

values through YAML specification file.

To display details of an attached Synapse Spark pool, execute the az ml compute show command. Pass the name of the attached Synapse Spark pool with the --name parameter, as shown:

az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

<ATTACHED_SPARK_POOL_NAME>
{
    "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
    },
    "created_on": "2022-09-13 19:01:05.109840+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "___location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
}

To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the az ml compute list command. Use the name parameter to pass the name of the workspace, as shown:

az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

[
    {
    "auto_pause_settings": {
        "auto_pause_enabled": true,
        "delay_in_minutes": 15
    },
    "created_on": "2022-09-09 21:28:54.871251+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "identity": {
        "principal_id": "<PRINCIPAL_ID>",
        "tenant_id": "<TENANT_ID>",
        "type": "system_assigned"
    },
    "___location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
        "auto_scale_enabled": false,
        "max_node_count": 0,
        "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
    },
    ...
]

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Azure Machine Learning Python SDK provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.

To attach a Synapse Compute using Python SDK, first create an instance of azure.ai.ml.MLClient class. This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses azure.identity.DefaultAzureCredential to connect to a workspace in the resource group of a specified Azure subscription. In the following code sample, define the SynapseSparkCompute with these parameters:

name - user-defined name of the new attached Synapse Spark pool.
resource_id - resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace

An azure.ai.ml.MLClient.begin_create_or_update() function call attaches the defined Synapse Spark pool to the Azure Machine Learning workspace.

from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"

synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resource)
ml_client.begin_create_or_update(synapse_comp)

To attach a Synapse Spark pool that uses system-assigned identity, pass IdentityConfiguration, with type set to SystemAssigned, as the identity parameter of the SynapseSparkCompute class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute, IdentityConfiguration
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(type="SystemAssigned")

synapse_comp = SynapseSparkCompute(
    name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)

A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the IdentityConfiguration class, as the identity parameter of the SynapseSparkCompute class. For the managed identity definition used in this way, set the type to UserAssigned. In addition, pass a user_assigned_identities parameter. The parameter user_assigned_identities is a list of objects of the UserAssignedIdentity class. The resource_id of the user-assigned identity populates each UserAssignedIdentity class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    SynapseSparkCompute,
    IdentityConfiguration,
    UserAssignedIdentity,
)
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(
    type="UserAssigned",
    user_assigned_identities=[
        UserAssignedIdentity(
            resource_id="/subscriptions/<SUBSCRIPTION_ID/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
        )
    ],
)

synapse_comp = SynapseSparkCompute(
    name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)

Note

The azure.ai.ml.MLClient.begin_create_or_update() function attaches a new Synapse Spark pool, if a pool with the specified name does not already exist in the workspace. However, if a Synapse Spark pool with that specified name is already attached to the workspace, a call to the azure.ai.ml.MLClient.begin_create_or_update() function will update the existing attached pool with the new identity or identities.

Add role assignments in Azure Synapse Analytics

To ensure that the attached Synapse Spark Pool works properly, assign the Administrator Role to it, from the Azure Synapse Analytics studio UI. These steps show how to do it:

Open your Synapse Workspace in Azure portal.
In the left pane, select Overview.
Select Open Synapse Studio.
In the Azure Synapse Analytics studio, select Manage in the left pane.
Select Access Control in the Security section of the left pane, second from the left.
Select Add.
The Add role assignment panel will open on the right side of the screen. In this panel:
1. Select Workspace item for Scope.
2. In the Item type dropdown menu, select Apache Spark pool.
3. In the Item dropdown menu, select your Apache Spark pool.
4. In Role dropdown menu, select Synapse Administrator.
5. In the Select user search box, start typing the name of your Azure Machine Learning Workspace. It shows you a list of attached Synapse Spark pools. Select your desired Synapse Spark pool from the list.
6. Select Apply.

Update the Synapse Spark Pool

You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should create a user-assigned managed identity in Azure portal, before you assign it to a Synapse Spark pool.

To update managed identity for the attached Synapse Spark pool:

Open the Details page for the Synapse Spark pool in the Azure Machine Learning studio.
Find the edit icon, located on the right side of the Managed identity section.
To assign a managed identity for the first time, toggle Assign a managed identity to enable it.
To assign a system-assigned managed identity:
1. Select System-assigned as the Identity type.
2. Select Update.
To assign a user-assigned managed identity:
1. Select User-assigned as the Identity type.
2. Select an Azure Subscription from the dropdown menu.
3. Type the first few letters of the name of user-assigned managed identity in the box that shows the text Search by name. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
4. Select Update.

APPLIES TO: Azure CLI ml extension v2 (current)

To update the identity associated with an attached Synapse Spark pool, execute the az ml compute update command with appropriate parameters. To assign a system-assigned identity, set the --identity parameter in the command to SystemAssigned, as shown:

az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>

This sample shows the expected output of the above command:

Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
{
    "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
    },
    "created_on": "2022-09-13 20:02:15.746490+00:00",
    "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
    "identity": {
    "principal_id": "<PRINCIPAL_ID>",
    "tenant_id": "<TENANT_ID>",
    "type": "system_assigned"
    },
    "___location": "eastus2",
    "name": "<ATTACHED_SPARK_POOL_NAME>",
    "node_count": 5,
    "node_family": "MemoryOptimized",
    "node_size": "Small",
    "provisioning_state": "Succeeded",
    "resourceGroup": "<RESOURCE_GROUP>",
    "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<AML_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
    "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
    },
    "spark_version": "3.2",
    "type": "synapsespark"
}

To assign a user-assigned identity, set the parameter --identity in the command to UserAssigned. Additionally, you should use the --user-assigned-identities parameter to pass the resource ID for the user-assigned identity, as shown:

az ml compute update --identity UserAssigned --user-assigned-identities /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>

This sample shows the expected output of the above command:

Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
{
  "auto_pause_settings": {
    "auto_pause_enabled": true,
    "delay_in_minutes": 15
  },
  "created_on": "2022-09-13 20:02:15.746490+00:00",
  "id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.MachineLearningServices/workspaces/<AML_WORKSPACE_NAME>/computes/<ATTACHED_SPARK_POOL_NAME>",
  "identity": {
    "type": "user_assigned",
    "user_assigned_identities": [
      {
        "client_id": "<CLIENT_ID>",
        "principal_id": "<PRINCIPAL_ID>",
        "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourcegroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
      }
    ]
  },
  "___location": "eastus2",
  "name": "<ATTACHED_SPARK_POOL_NAME>",
  "node_count": 5,
  "node_family": "MemoryOptimized",
  "node_size": "Small",
  "provisioning_state": "Succeeded",
  "resourceGroup": "<RESOURCE_GROUP>",
  "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>",
  "scale_settings": {
    "auto_scale_enabled": false,
    "max_node_count": 0,
    "min_node_count": 0
  },
  "spark_version": "3.2",
  "type": "synapsespark"
}

Note

The parameter --user-assigned-identities can take a list of resource IDs and assign multiple user-defined identities to an attached Synapse Spark pool. The first user-assigned identity in the list will be used for submitting a job by default.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

To use system-assigned identity, pass IdentityConfiguration, with type set to SystemAssigned, as the identity parameter of the SynapseSparkCompute class. This code snippet updates a Synapse Spark pool to use a system-assigned identity:

# import required libraries 
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute, IdentityConfiguration 
from azure.identity import DefaultAzureCredential
    
subscription_id = "<SUBSCRIPTION_ID>" 
resource_group_name = "<RESOURCE_GROUP>" 
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace 
) 

synapse_name = "<ATTACHED_SPARK_POOL_NAME>" 
synapse_resource ="/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>" 
synapse_identity = IdentityConfiguration(type="SystemAssigned") 

synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resource,identity=synapse_identity) ml_client.begin_create_or_update(synapse_comp)

A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the IdentityConfiguration class, as the identity parameter of the SynapseSparkCompute class. For the managed identity definition used in this way, set the type to UserAssigned. In addition, pass a user_assigned_identities parameter. The parameter user_assigned_identities is a list of objects of the UserAssignedIdentity class. The resource_idof the user-assigned identity populates each UserAssignedIdentity class object. This code snippet updates a Synapse Spark pool to use a user-assigned identity:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    SynapseSparkCompute,
    IdentityConfiguration,
    UserAssignedIdentity,
)
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
synapse_resource = "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Synapse/workspaces/<SYNAPSE_WORKSPACE_NAME>/bigDataPools/<SPARK_POOL_NAME>"
synapse_identity = IdentityConfiguration(
    type="UserAssigned",
    user_assigned_identities=[
        UserAssignedIdentity(
            resource_id="/subscriptions/<SUBSCRIPTION_ID/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID>"
        )
    ],
)

synapse_comp = SynapseSparkCompute(
    name=synapse_name, resource_id=synapse_resource, identity=synapse_identity
)
ml_client.begin_create_or_update(synapse_comp)

Note

If a pool with the specified name does not already exist in the workspace, the azure.ai.ml.MLClient.begin_create_or_update() function will attach a new Synapse Spark pool. However, if a Synapse Spark pool, with the specified name, is already attached to the workspace, an azure.ai.ml.MLClient.begin_create_or_update() function call will update the existing attached pool, with the new identity or identities.

Detach the Synapse Spark pool

We might want to detach an attached Synapse Spark pool, to clean up a workspace.

The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this, follow these steps:

Open the Details page for the Synapse Spark pool, in the Azure Machine Learning studio.
Select Detach, to detach the attached Synapse Spark pool.

APPLIES TO: Azure CLI ml extension v2 (current)

An attached Synapse Spark pool can be detached by executing the az ml compute detach command with the name of the pool passed, using the --name parameter, as shown here:

az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>

This sample shows the expected output of the above command:

Are you sure you want to perform this operation? (y/n): y

APPLIES TO: Python SDK azure-ai-ml v2 (current)

We'll use an MLClient.compute.begin_delete() function call. Pass the name of the attached Synapse Spark pool, along with the action Detach, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:

# import required libraries
from azure.ai.ml import MLClient
from azure.ai.ml.entities import SynapseSparkCompute
from azure.identity import DefaultAzureCredential

subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace_name = "<AML_WORKSPACE_NAME>"

ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace_name
)

synapse_name = "<ATTACHED_SPARK_POOL_NAME>"
ml_client.compute.begin_delete(name=synapse_name, action="Detach")

Serverless Spark compute in Azure Machine Learning

Some user scenarios might require access to a serverless Spark compute resource, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. Learn more about the serverless Spark compute experience.

Next steps

Feedback

Was this page helpful?

Share via

Attach and manage a Synapse Spark pool in Azure Machine Learning

Prerequisites

Attach a Synapse Spark pool in Azure Machine Learning

Add role assignments in Azure Synapse Analytics

Update the Synapse Spark Pool

Detach the Synapse Spark pool

Serverless Spark compute in Azure Machine Learning

Next steps

Feedback

Additional resources