Edit

Share via


What is Data Factory in Microsoft Fabric?

Data Factory in Microsoft Fabric helps you solve one of business's toughest challenges: turning scattered data into useful insights.

Your organization's data lives in many different places: databases, files, cloud services, and legacy systems. This makes it hard to get a complete picture of your business. Data Factory connects to over 170 data sources, including multicloud environments and hybrid setups with on-premises gateways. It helps you move and transform your data at scale, turning it into formats that work well for analytics and decision-making.

Diagram of the data integration stack in Microsoft Fabric.

Diagram of Data Factory in Microsoft Fabric that shows a selection of connectors linked to analytics and data development tools in Fabric through data movement, orchestration, and transformation. This all sits on top of Fabric OneLake, and the entire stack is woven through with AI-powered intelligence.

Whether you're a business user building your first data analytics strategy, or a developer crafting complex workstreams, you'll find the right tools to:

  • Bring your data together
  • Clean it up
  • Make it ready for analysis in your Lakehouse or Data Warehouse
  • Automate your data workflows

What's data integration?

Data integration is the process of bringing your strategic data together so you can access and analyze it. It's a key part of any business that wants to make data-driven decisions.

There are many ways to integrate your data, but one of the most common strategies is ETL. ETL stands for Extract, Transform, Load. It takes information from many different sources, transforms it into a format you can analyze, and loads it into a common destination system for analysis or reporting. When you implement an ETL process in your business's data platform, it improves data consistency, quality, and accessibility.

Here's what each phase does:

  • Extract: Reads data from your sources and moves it to a central storage ___location. Sources can be databases, files, APIs, websites, and more.
  • Transform: Cleans, enriches, and transforms your data into a format that's easy to analyze. For example, you might want to compare sales data from a SQL database with scanned, historical sales documents. After extracting the data, you need to transform the data from each source so it's in the same format, check for corruptions or duplicates, and combine the data into a single dataset.
  • Load: Writes the transformed data to a destination system, like a data warehouse or data lake. The destination system is where you can run queries and reports on your data.

ETL or ELT?

When you work with data, how you move and transform it matters, and every organization is going to have different needs. For example: ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform). Each has strengths, depending on your needs for performance, scalability, and cost.

ETL: Transform your data before loading it into its destination. This works well when you need to clean, standardize, or enrich data as it moves. For example, use Data Factory's Dataflow Gen 2 to apply transformations at scale before loading data into a warehouse or Lakehouse.

ELT: Load raw data first, then transform it where it's stored. This approach uses the power of analytics engines like Fabric's OneLake, Spark Notebooks, or SQL-based tools. ELT works well for handling large datasets with modern, cloud-scale compute.

Fabric Data Factory supports both. You can:

  • Build classic ETL pipelines for immediate data quality and readiness
  • Use ELT workflows to take advantage of integrated compute and storage for large-scale transformations
  • Combine both approaches in the same solution for flexibility

Data Factory is a powerful data integration solution

Data Factory connects to your data, moves it, transforms it, and orchestrates your data movement and transformation tasks from one place. You decide what strategy works best for your business, and Data Factory provides the tools to get it done.

Connect to your data: Whether on-premises, in the cloud, or across multicloud environments, Data Factory connects to your data sources and destinations. It supports a wide range of data sources, including databases, data lakes, file systems, APIs, and more. See available connectors for a complete list of supported data sources and destinations.

Move data: Data Factory provides several methods to move data from source to destination, or provide easy access to existing data, depending on your needs.

  • Copy job - Preferred solution for simplified data movement with native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication. It also offers the flexibility to handle a wide range of scenarios from many sources to many destinations — all through an intuitive, easy-to-use experience.
  • Copy activity - Moves data from one place to another at any scale, with extensive customization, support for a wide range of sources and destinations, and manual control of parallel copying for improved performance.
  • Mirroring - Create a near real-time replica of your operational database within OneLake in Microsoft Fabric to make your analytics and reporting easier.

See our data movement decision guide to help you choose the right data movement method for your scenario.

Transform: Data Factory provides activities to connect you to your custom transformation scripts or the powerful dataflows designer.

  • Pipeline activities - Fabric notebook, HDInsight activity, Spark job definition, stored procedure, SQL scripts, and more. These activities let you run custom code or scripts to transform your data.
  • Dataflow Gen 2 - Transform your data using a low-code interface with over 300 transformations. You can perform joins, aggregations, data cleansing, custom transformations, and much more.

Orchestrate: Data Factory lets you create pipelines that can run multiple data movements, transformations, and other activities in a single workflow.

AI powered data integration

AI shows up throughout Data Factory to help you get more done with less effort. Copilot for Data Factory lets you design, edit, and manage pipelines and dataflows using natural language. You can type plain-English prompts, and Copilot turns them into working ETL steps.

Copilot also summarizes your existing dataflow queries and pipelines, so you can quickly understand what they do. If you run into errors, Copilot explains what went wrong and suggests ways to fix it.

For details, see Copilot in Fabric in the Data Factory workload.

What do you need to get started?

What if we already use Azure Data Factory?

Data Factory in Microsoft Fabric is the next generation of Azure Data Factory, built to handle your most complex data integration challenges with a simpler approach.

See our comparison guide for the key differences between these two services, so you can make the right choice for your enterprise.

When you're ready to migrate, follow our migration guide.

For more information, and to get started with Microsoft Fabric, follow these guides: