Edit

Share via


What is the Copy job in Data Factory for Microsoft Fabric?

Data Factory in Fabric enables users to integrate data from over 100 built-in connectors, offering three key capabilities: data ingestion, data transformation, and data orchestration. Dataflow Gen2 handles data transformations, while pipelines and Airflow manage integration flows. Copy Job simplifies data ingestion with built-in patterns for batch and incremental copy, eliminating the need for pipeline creation.

Advantages of the Copy job

While the Copy activity within data pipelines handles data ingestion with bulk/batch operations, creating data pipelines in Data Factory still proves to challenge for many users that are new to the field, with a steeper learning curve. So, we're thrilled to introduce the Copy job, elevating the data ingestion experience to a more streamlined and user-friendly process from any source to any destination. Now, you can use Copy Job to simplify data ingestion without the need to create pipelines. Moreover, Copy Job supports various data delivery styles, including both built-in batch copy and incremental copy, offering flexibility to meet your specific needs.

Screenshot showing the Copy job and its results pane.

Some advantages of the Copy job over other data movement methods include:

  • Intuitive experience: No compromises experience for data copying including both configuration and monitoring, making it easier than ever.
  • Efficiency: Enable incremental copying effortlessly, reducing manual intervention. This efficiency translates to less resource utilization and faster copy durations.
  • Flexibility: While enjoying the simplicity, you also have the flexibility to control your data movement. Choose which tables and columns to copy, map the data, define read/write behavior, and set schedules that fit your needs, whether for a one-time task or recurring operation.
  • Robust performance: A serverless setup enabling data transfer with large-scale parallelism, maximizing data movement throughput for your system.

Supported connectors

You can use the Copy Job to move your data across cloud data stores or from on-premises data stores behind a firewall or within a virtual network via a gateway. The Copy job supports the following data stores as both source and destination:

Connector Source Destination Full load Incremental load (Preview) Append Override Merge On-premises data gateway
Azure SQL DB
Oracle
On-premises SQL Server
Fabric Warehouse
Fabric Lakehouse table
Fabric Lakehouse file
Amazon S3
Azure Data Lake Storage Gen2
Azure Blob Storage
Azure SQL Managed Instance
Snowflake
Azure Synapse Analytics
Azure Data Explorer
Azure PostgreSQL
Google Cloud Storage
MySQL
Azure MySQL
PostgreSQL
SQL database in Fabric (Preview)
Amazon S3 compatible
SAP HANA
ODBC
Amazon RDS for SQL Server
Google BigQuery
Salesforce
Salesforce service cloud

Note

Staging copy is not yet supported by Copy Job, which means copying data from sources like Snowflake, Fabric Warehouse, and Synapse SQL Pool using OPDG may fail in some cases due to this limitation. The product team is actively addressing such issues and adding more connectors. Please also share your feedback on Fabric Ideas.

Copy behavior

You can choose from the following data delivery styles.

  • Full copy mode: Each copy job run copies all data from the source to the destination at once.

  • Incremental copy mode: The initial job run copies all data, and subsequent job runs only copy changes since the last run. When copying from a database, new or updated rows will be captured and moved to your destination. When copying from a storage store, new or updated files identified by their LastModifiedTime will be captured and moved to your destination.

    Note

    Incremental copy mode is still in Preview. Want early access to native change data capture? Sign up here.

You can also choose how data is written to your destination store.

By default, Copy Job appends data to your destination, so that you won't miss any change history. But, you can also adjust the update method to merge or overwrite. When performing a merge, you need to provide a key column. By default, the primary key is used if it has.

  • When copy data to storage store: New rows from the tables or files are copied to new files in the destination. If a file with the same name already exists on target store, it will be overwritten.
  • When copy data to database: New rows from the tables or files are appended to destination tables. You can change the update method to merge or overwrite for supported data stores.

Incremental column

In incremental copy mode, you need to select an incremental column for each table to identify changes. Copy Job uses this column as a watermark, comparing its value with the same from last run in order to copy the new or updated data only. The incremental column can be a timestamp or an increasing INT.

Region availability

The Copy job has the same regional availability as the pipeline.

Pricing

The Copy job uses the same billing meter: Data Movement, with an identical consumption rate.