Share via


Override with target settings

This page describes how to override or join top-level settings with target settings in Databricks Asset Bundles. For information about bundle settings, see Databricks Asset Bundle configuration.

Artifact settings override

You can override the artifact settings in a top-level artifacts mapping with the artifact settings in a targets mapping, for example:

# ...
artifacts:
  <some-unique-programmatic-identifier-for-this-artifact>:
    # Artifact settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    artifacts:
      <the-matching-programmatic-identifier-for-this-artifact>:
        # Any more artifact settings to join with the settings from the
        # matching top-level artifacts mapping.

If any artifact setting is defined both in the top-level artifacts mapping and the targets mapping for the same artifact, then the setting in the targets mapping takes precedence over the setting in the top-level artifacts mapping.

Example 1: Artifact settings defined only in the top-level artifacts mapping

To demonstrate how this works in practice, in the following example, path is defined in the top-level artifacts mapping, which defines all of the settings for the artifact:

# ...
artifacts:
  my-artifact:
    type: whl
    path: ./my_package
# ...

When you run databricks bundle validate for this example, the resulting graph is:

{
  "...": "...",
  "artifacts": {
    "my-artifact": {
      "type": "whl",
      "path": "./my_package",
      "...": "..."
    }
  },
  "...": "..."
}

Example 2: Conflicting artifact settings defined in multiple artifact mappings

In this example, path is defined both in the top-level artifacts mapping and in the artifacts mapping in targets. In this example, path in the artifacts mapping in targets takes precedence over path in the top-level artifacts mapping, to define the settings for the artifact:

# ...
artifacts:
  my-artifact:
    type: whl
    path: ./my_package

targets:
  dev:
    artifacts:
      my-artifact:
        path: ./my_other_package
    # ...

When you run databricks bundle validate for this example, the resulting graph is:

{
  "...": "...",
  "artifacts": {
    "my-artifact": {
      "type": "whl",
      "path": "./my_other_package",
      "...": "..."
    }
  },
  "...": "..."
}

Cluster settings overrides

You can override or join the job or pipeline cluster settings for a target.

For jobs, use job_cluster_key within a job definition to identify job cluster settings in the top-level resources mapping to join with job cluster settings in a targets mapping:

# ...
resources:
  jobs:
    <some-unique-programmatic-identifier-for-this-job>:
      # ...
      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      jobs:
        <the-matching-programmatic-identifier-for-this-job>:
          # ...
          job_clusters:
            - job_cluster_key: <the-matching-programmatic-identifier-for-this-key>
              # Any more cluster settings to join with the settings from the
              # resources mapping for the matching top-level job_cluster_key.
          # ...

If any cluster setting is defined both in the top-level resources mapping and the targets mapping for the same job_cluster_key, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

For Lakeflow Declarative Pipelines, use label within the cluster settings of a pipeline definition to identify cluster settings in a top-level resources mapping to join with the cluster settings in a targets mapping, for example:

# ...
resources:
  pipelines:
    <some-unique-programmatic-identifier-for-this-pipeline>:
      # ...
      clusters:
        - label: default | maintenance
          # Cluster settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      pipelines:
        <the-matching-programmatic-identifier-for-this-pipeline>:
          # ...
          clusters:
            - label: default | maintenance
              # Any more cluster settings to join with the settings from the
              # resources mapping for the matching top-level label.
          # ...

If any cluster setting is defined both in the top-level resources mapping and the targets mapping for the same label, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

Example 1: New job cluster settings defined in multiple resource mappings and with no settings conflicts

In this example, spark_version in the top-level resources mapping is combined with node_type_id and num_workers in the resources mapping in targets to define the settings for the job_cluster_key named my-cluster:

# ...
resources:
  jobs:
    my-job:
      name: my-job
      job_clusters:
        - job_cluster_key: my-cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          job_clusters:
            - job_cluster_key: my-cluster
              new_cluster:
                node_type_id: Standard_DS3_v2
                num_workers: 1
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "job_clusters": [
          {
            "job_cluster_key": "my-cluster",
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 1,
              "spark_version": "13.3.x-scala2.12"
            }
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 2: Conflicting new job cluster settings defined in multiple resource mappings

In this example, spark_version, and num_workers are defined both in the top-level resources mapping and in the resources mapping in targets. In this example, spark_version and num_workers in the resources mapping in targets take precedence over spark_version and num_workers in the top-level resources mapping, to define the settings for the job_cluster_key named my-cluster:

# ...
resources:
  jobs:
    my-job:
      name: my-job
      job_clusters:
        - job_cluster_key: my-cluster
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: Standard_DS3_v2
            num_workers: 1

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          job_clusters:
            - job_cluster_key: my-cluster
              new_cluster:
                spark_version: 12.2.x-scala2.12
                num_workers: 2
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "job_clusters": [
          {
            "job_cluster_key": "my-cluster",
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 2,
              "spark_version": "12.2.x-scala2.12"
            }
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 3: Pipeline cluster settings defined in multiple resource mappings and with no settings conflicts

In this example, node_type_id in the top-level resources mapping is combined with num_workers in the resources mapping in targets to define the settings for the label named default:

# ...
resources:
  pipelines:
    my-pipeline:
      clusters:
        - label: default
          node_type_id: Standard_DS3_v2

targets:
  development:
    resources:
      pipelines:
        my-pipeline:
          clusters:
            - label: default
              num_workers: 1
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

{
  "...": "...",
  "resources": {
    "pipelines": {
      "my-pipeline": {
        "clusters": [
          {
            "label": "default",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 1
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 4: Conflicting pipeline cluster settings defined in multiple resource mappings

In this example, num_workers is defined both in the top-level resources mapping and in the resources mapping in targets. num_workers in the resources mapping in targets take precedence over num_workers in the top-level resources mapping, to define the settings for the label named default:

# ...
resources:
  pipelines:
    my-pipeline:
      clusters:
        - label: default
          node_type_id: Standard_DS3_v2
          num_workers: 1

targets:
  development:
    resources:
      pipelines:
        my-pipeline:
          clusters:
            - label: default
              num_workers: 2
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

{
  "...": "...",
  "resources": {
    "pipelines": {
      "my-pipeline": {
        "clusters": [
          {
            "label": "default",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 2
          }
        ],
        "...": "..."
      }
    }
  }
}

Job task settings override

You can use the tasks mapping within a job definition to join the job tasks settings in a top-level resources mapping with the job task settings in a targets mapping, for example:

# ...
resources:
  jobs:
    <some-unique-programmatic-identifier-for-this-job>:
      # ...
      tasks:
        - task_key: <some-unique-programmatic-identifier-for-this-task>
          # Task settings.

targets:
  <some-unique-programmatic-identifier-for-this-target>:
    resources:
      jobs:
        <the-matching-programmatic-identifier-for-this-job>:
          # ...
          tasks:
            - task_key: <the-matching-programmatic-identifier-for-this-key>
              # Any more task settings to join with the settings from the
              # resources mapping for the matching top-level task_key.
          # ...

To join the top-level resources mapping and the targets mapping for the same task, the task mappings' task_key must be set to the same value.

If any job task setting is defined both in the top-level resources mapping and the targets mapping for the same task, then the setting in the targets mapping takes precedence over the setting in the top-level resources mapping.

Example 1: Job task settings defined in multiple resource mappings and with no settings conflicts

In this example, spark_version in the top-level resources mapping is combined with node_type_id and num_workers in the resources mapping in targets to define the settings for the task_key named my-task:

# ...
resources:
  jobs:
    my-job:
      name: my-job
      tasks:
        - task_key: my-key
          new_cluster:
            spark_version: 13.3.x-scala2.12

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          tasks:
            - task_key: my-task
              new_cluster:
                node_type_id: Standard_DS3_v2
                num_workers: 1
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows (ellipses indicate omitted content, for brevity):

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "tasks": [
          {
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 1,
              "spark_version": "13.3.x-scala2.12"
            },
            "task-key": "my-task"
          }
        ],
        "...": "..."
      }
    }
  }
}

Example 2: Conflicting job task settings defined in multiple resource mappings

In this example, spark_version, and num_workers are defined both in the top-level resources mapping and in the resources mapping in targets. spark_version and num_workers in the resources mapping in targets take precedence over spark_version and num_workers in the top-level resources mapping. This defines the settings for the task_key named my-task (ellipses indicate omitted content, for brevity):

# ...
resources:
  jobs:
    my-job:
      name: my-job
      tasks:
        - task_key: my-task
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: Standard_DS3_v2
            num_workers: 1

targets:
  development:
    resources:
      jobs:
        my-job:
          name: my-job
          tasks:
            - task_key: my-task
              new_cluster:
                spark_version: 12.2.x-scala2.12
                num_workers: 2
          # ...

When you run databricks bundle validate for this example, the resulting graph is as follows:

{
  "...": "...",
  "resources": {
    "jobs": {
      "my-job": {
        "tasks": [
          {
            "new_cluster": {
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 2,
              "spark_version": "12.2.x-scala2.12"
            },
            "task_key": "my-task"
          }
        ],
        "...": "..."
      }
    }
  }
}