Edit

Share via


Data and AI

This article compares core Azure data and AI services to corresponding Amazon Web Services (AWS) solutions.

For comparison of other Azure and AWS services, see Azure for AWS professionals.

Data governance, management, and platforms

Both Microsoft Purview and the combination of AWS services described in the following table provide comprehensive data governance solutions. Use these solutions to manage, discover, classify, and secure your data assets.

AWS services Microsoft service Description
AWS Glue Data Catalog, AWS Lake Formation, Amazon Macie, AWS Identity and Access Management (IAM), AWS Config Microsoft Purview Both options provide data governance, cataloging, and compliance features. Microsoft Purview is a unified data governance solution. You can use it to discover, classify, and manage data across on-premises, multicloud, and software as a service (SaaS) environments. It also provides data lineage and compliance capabilities.

AWS delivers comparable features through several services, including AWS Glue Data Catalog for metadata management, AWS Lake Formation for data lake creation and governance, Amazon Macie for data classification and protection, AWS IAM for access control, and AWS Config for configuration management and compliance tracking.

All-in-one platform vs. AWS services

Microsoft Fabric provides an all-in-one platform that unifies the data and AI services required for modern analytics solutions. It moves data efficiently between services, provides unified governance and security, and simplifies pricing models. This approach contrasts with the AWS approach, where you often use separate services and must invest more effort in integration. Fabric provides integration across these functions within the Azure ecosystem.

Both AWS and Fabric provide capabilities for data integration, processing, analytics, machine learning, and business intelligence.

AWS service Microsoft service Description
AWS Glue Fabric data integration with Azure Data Factory AWS Glue provides capabilities to build data and analytics solutions. This approach provides flexibility but requires more effort to integrate each service into an end-to-end solution. Fabric combines capabilities within a single platform to simplify workflows, collaboration, and management.

Detailed comparison of AWS services and Fabric components

The following table compares key Fabric components and their corresponding AWS services. It helps architects and decision-makers understand how the Fabric data platform aligns or diverges from AWS offerings across data engineering, analytics, governance, and AI workloads.

AWS services Microsoft service
AWS Glue Data integration with Data Factory
Amazon Elastic MapReduce (EMR), AWS Glue interactive sessions Data engineering with Apache Spark
Amazon Redshift Data warehousing with Synapse Data Warehouse
Amazon SageMaker Data Science (Azure Machine Learning integration)
Amazon Kinesis, Amazon Managed Service for Apache Flink Real-time analytics (KQL database)
Amazon Quick Sight Power BI for business intelligence
Amazon S3 Fabric OneLake unified data lake storage
AWS Lake Formation, AWS Glue Data Catalog, Amazon Macie Data governance (Microsoft Purview integration)
Amazon Bedrock, Amazon SageMaker JumpStart Generative AI (Azure OpenAI in Azure AI Foundry Models)

Data integration and ETL tools

Data integration and extract, transform, and load (ETL) tools help extract, transform, and load data from multiple sources into a unified system for analysis.

AWS service Microsoft service Analysis
AWS Glue Azure Data Factory, Azure Data Factory in Fabric The Data Factory service, the Azure Data Factory feature in Fabric, and AWS Glue are managed ETL services that facilitate data integration across various sources.
Amazon Managed Workflows for Apache Airflow (MWAA) Apache Airflow jobs in Fabric Apache Airflow provides managed workflow orchestration for complex data pipelines. The Apache Airflow job feature in Fabric serves as the next generation of the Data Factory Workflow Orchestration Manager. You can use this feature to create and manage Apache Airflow jobs and run directed acyclic graphs (DAGs). As part of Azure Data Factory in Fabric, the Airflow job feature provides data integration, preparation, and transformation from data sources like databases, data warehouses, lakehouses, and real-time data. AWS MWAA is a managed Airflow solution.
AWS Database Migration Service (DMS) Fabric Migration Assistant for Data Warehouse These services help you migrate databases from AWS to Azure. The Fabric Migration Assistant is a built-in tool in Fabric that guides you through migrating data and metadata from source databases in AWS to Fabric Data Warehouse. It converts schemas, uses AI to resolve migration problems, and supports migration from SQL-based sources. AWS DMS focuses on migrations within the AWS environment and provides ongoing replication features for hybrid architectures.
AWS DMS Azure Database Migration Service These services help you migrate databases to the cloud with minimal downtime. The Azure service focuses on migrating to Azure databases and includes assessment and recommendation tools.

AWS DMS focuses on migrations within the AWS environment and provides ongoing replication features for hybrid architectures.
Amazon AppFlow Azure Logic Apps These services automate data flows between cloud applications and services without requiring code. Logic Apps provides integration capabilities through a wide range of connectors and a visual designer. AppFlow provides secure data transfer between specific SaaS applications and AWS services and includes built-in data transformation features.
AWS Step Functions Data Factory with Logic Apps These services provide workflow orchestration for coordinating distributed applications and microservices. Logic Apps supports both data integration and enterprise workflow automation. Step Functions orchestrates AWS services and microservices in serverless applications.

Data warehousing

The following solutions store and manage large volumes of structured data optimized for querying and reporting.

AWS service Microsoft service Analysis
Amazon Redshift Fabric Data Warehouse Fabric Data Warehouse and Amazon Redshift are managed, cloud-based, and petabyte (PB)-scale data warehouses designed for high-performance analytics at scale. Fabric Data Warehouse integrates with Fabric and provides a unified platform that combines storage, analytics, governance, and AI.

Redshift uses the AWS ecosystem and focuses on data warehousing. Both services support massive parallel processing. Fabric has a lake-first architecture and deep integration across Microsoft data and AI services.
Amazon Redshift Spectrum OneLake shortcuts, Direct Lake in Power BI, and pipeline connectors in Azure Data Factory Amazon Redshift Spectrum enables querying external data in Amazon S3. In contrast, Fabric provides a lake-first approach. Use OneLake shortcuts to virtualize data from multiple sources into a single logical lake without movement. Direct Lake mode in Power BI delivers instant analytics on open Delta and Parquet files in OneLake without import. Fabric Data Factory pipelines provide native connectors to ingest, transform, and orchestrate data flows.
AWS Lake Formation OneLake, Microsoft Purview in Fabric, and Fabric permission model AWS Lake Formation provides governance and access controls on top of Amazon S3-based data lakes. In contrast, Fabric delivers these capabilities through OneLake combined with Microsoft Purview for cataloging, lineage, and data governance. You use role-based access control (RBAC) and fine-grained security to provide access across workspaces, tables, and columns.
Amazon Relational Database Service (RDS) with Amazon Redshift Federated Query Fabric SQL Database, Amazon Redshift connector in Dataflow Gen2, Fabric data pipelines, and OneLake shortcuts Amazon RDS with Amazon Redshift Federated Query enables Amazon Redshift to run SQL queries directly on live RDS data. This setup provides real-time access across operational and analytical stores.

Fabric SQL Database introduces a SaaS-native SQL engine with autoscaling, built-in governance, and integration with the Fabric platform. Fabric data pipelines support ingestion from Amazon RDS and Amazon Redshift into lakehouses or SQL databases. OneLake shortcuts virtualize external data, such as Azure Data Lake Storage Gen2 and Amazon S3, into Fabric without duplication.
Amazon RDS with Amazon Redshift Federated Query Azure SQL Database These services support querying across operational databases and data warehouses. SQL Database can integrate with Azure analytics services. In contrast, AWS requires you to combine RDS and Amazon Redshift for cross-service querying capabilities through federated queries.
Amazon Aurora with Amazon Redshift integration SQL Database in Fabric Amazon Aurora handles operational data, and Amazon Redshift performs large-scale analytics through federated queries and batch ingestion. Fabric SQL Database provides a managed, autoscaling relational engine that integrates natively with OneLake and Power BI. This setup supports unified analytics and governance.
Amazon Aurora with Amazon Redshift integration SQL Database Serverless These managed, cloud-native relational databases separate compute from storage, automatically scale resources based on demand, and ensure high availability. Both services use SQL-based engines and extend into cost-efficient solutions for transactional and analytical workloads. SQL Database Serverless automatically pauses during inactivity to optimize cost while providing the full SQL Server engine.

Data lake solutions

The following platforms store vast amounts of raw structured and unstructured data in its native format for future processing.

AWS service Microsoft service Analysis
Amazon S3 OneLake, Data Lake Storage Data Lake Storage and Amazon S3 are scalable object storage solutions designed for big data analytics. They support formats like Parquet, comma-separated values (CSV), and JSON. Data Lake Storage is optimized for Azure-native tools, while Amazon S3 integrates with AWS services.

OneLake unifies structured and unstructured data across clouds into a single, governed lake. With OneLake shortcuts, Fabric can virtualize data from Amazon S3, Data Lake Storage, and Google Cloud without duplication, which supports access and analytics. OneLake supports multicloud flexibility, zero-ETL integration, and Delta Lake.
AWS Lake Formation OneLake AWS Lake Formation manages data lakes within the AWS ecosystem. OneLake provides a SaaS-native data lake that supports all Fabric workloads, including lakehouses, warehouses, Real-Time Intelligence, and Power BI. OneLake requires no extra setup and includes built-in governance through Microsoft Purview. It also has native support for Delta Lake and shortcuts for multicloud virtualization, including Amazon S3.
Amazon Athena Fabric lakehouse Amazon Athena is a serverless query engine that enables real-time SQL analysis directly on data stored in Amazon S3. A Fabric lakehouse provides an integrated environment for both data engineering and analytics. It stores data in OneLake by using the Delta Lake format, and supports Spark, T-SQL, and Python.
AWS Glue Data Catalog Microsoft Purview AWS Glue Data Catalog centralizes metadata for analytics and machine learning. It serves as a metadata store and schema registry and requires other services to manage lineage, policy, and governance.

Microsoft Purview is a unified data governance service that spans Azure, OneLake, and on-premises and multicloud environments. It catalogs data in OneLake, Data Lake Storage, and other sources. It provides data classification, lineage visualization, policy management, and glossary integration through its Unified Catalog. From a data lake perspective, Microsoft Purview delivers a governance-first approach by connecting metadata, security, and compliance in one platform.

Big data analytics

These services process and analyze large and complex datasets to uncover patterns, insights, and trends. The following table provides direct comparisons of individual big data services. Fabric is an all-in-one service for big data and analytics. It provides the following services and more.

AWS service Microsoft service Analysis
Amazon EMR Fabric Data Engineering workloads that use Spark Amazon EMR is a managed big data service that runs frameworks like Spark, Hadoop, and Hive. You must provision and tune clusters. The Fabric Data Engineering workload uses Spark to remove the need for cluster management. It provides a serverless, integrated, and governed experience within the Fabric ecosystem.
Amazon EMR Azure Databricks These services support big data processing via Spark in a managed environment. Amazon EMR runs Spark clusters and provides flexible configuration and scaling options. Azure Databricks provides an optimized Spark platform that includes collaborative notebooks and integrated workflows.
Amazon Kinesis Azure Event Hubs and Azure Stream Analytics These services provide real-time data streaming and analytics for processing and analyzing high-volume data streams.
AWS Glue with AWS Glue Studio Fabric Data Engineering workloads that use Spark AWS Glue Studio combined with Amazon Kinesis provides data integration and real-time streaming pipelines, but it requires managing data movement between services. Fabric Data Engineering workloads use Spark to deliver these capabilities directly into the Fabric platform. Batch and streaming transformations, orchestration, and governance integrate with OneLake, Purview, and Power BI. Fabric delivers a single experience for data integration and engineering, without management of separate services for ETL, streaming, and analytics.
AWS Glue with AWS Glue Studio Azure Databricks and Data Factory Both service combinations provide big data processing capabilities that include integrated data transformation and analytics.

Business intelligence and reporting

The following services provide data visualization, reporting, and dashboards to help you make informed decisions.

AWS service Microsoft service Analysis
Amazon Quick Sight Power BI Power BI and Amazon Quick Sight provide business analytics tools for data visualization and interactive dashboards.
Amazon Managed Grafana Azure Managed Grafana These services provide managed Grafana to visualize metrics, logs, and traces across multiple data sources.
AWS Data Exchange External data sharing in Fabric and OneLake shortcuts AWS Data Exchange provides a marketplace where your organization can subscribe to and consume external datasets. The service handles licensing and secure delivery. In Fabric, external collaboration is available through OneLake shortcuts and cross-tenant sharing. External data becomes available across Spark, SQL, KQL, and Power BI.
AWS Data Exchange Azure Data Share These services facilitate the secure sharing and exchange of data between organizations. AWS Data Exchange provides a marketplace model. Data Share focuses on cross-tenant data sharing.
Amazon OpenSearch Service with Kibana Fabric KQL database with Power BI Amazon OpenSearch Service with Kibana provides a managed search and analytics platform for indexing, querying, and visualizing large datasets, commonly used for log analytics and observability. Fabric delivers similar capabilities through its KQL database for real-time data exploration, combined with Power BI for interactive reporting.
Amazon OpenSearch Service with Kibana Azure AI Search, Azure Data Explorer, and dashboards These services provide real-time data exploration and interactive analytics over large volumes of data. Amazon OpenSearch uses Kibana for search and visualization. AI Search provides intelligent full-text search. Azure Data Explorer uses KQL to power high-performance, real-time analytics with interactive dashboards for visualization.

Real-time data processing

The following systems ingest and analyze data as it's generated to provide immediate insights and responses.

AWS service Microsoft service Analysis
Amazon Kinesis Fabric Real-Time Intelligence hub, Fabric eventstream with Fabric KQL database Amazon Kinesis enables real-time data streaming, ingestion, and processing across services like Amazon S3, Amazon Redshift, and AWS Lambda. Fabric provides streaming architecture with the Real-Time Intelligence hub, which supports ingestion from multiple sources, including Amazon Kinesis, Apache Kafka, Event Hubs, and Google Pub/Sub. Fabric eventstreams manage stream routing, transformation, and alerting.
Amazon Kinesis Event Hubs and Stream Analytics These services process and analyze data from Internet of Things (IoT) devices in real time. Amazon Kinesis provides streaming ingestion and processing capabilities. Azure provides modular services. Event Hubs handles data ingestion, and Stream Analytics processes the data.
Amazon Managed Streaming for Kafka (MSK) Fabric eventstream with Kafka endpoints Amazon MSK is a managed Kafka service in AWS. Fabric eventstreams support Kafka endpoints for publishing and consuming data via the Kafka protocol. These eventstreams can also ingest data directly from Amazon MSK into the Fabric Real‑Time Intelligence hub for downstream processing and analytics, such as with an eventhouse with Power BI. Azure provides both a managed Kafka‑compatible ingestion plane (Event Hubs) and a managed Kafka cluster (Azure HDInsight). Fabric provides an end‑to‑end, real‑time analytics hub that integrates with Kafka.
Amazon MSK Event Hubs for Kafka These services provide managed Kafka clusters for creating real-time streaming data pipelines and applications. Event Hubs for Kafka exposes a Kafka‑compatible endpoint, and existing clients can connect with minimal changes. It also supports Kafka streams in Premium and Dedicated tiers.
AWS Lambda Fabric notebooks with Fabric data pipelines for serverless data processing AWS Lambda is a serverless, event‑driven compute for running code without managing servers. For analytics‑focused, serverless-style processing in Fabric, you can use Fabric notebooks with Azure Data Factory pipelines. Notebooks run managed Spark jobs for data ingestion, cleanup, and transformation. Pipelines orchestrate and schedule those notebooks as part of end‑to‑end data workflows, which provides on‑demand compute and no cluster management inside Fabric.
AWS Lambda Azure Functions with Azure API Management for API triggers These serverless compute platforms run code in response to events and automatically manage the underlying compute resources. Azure Functions delivers the same event‑driven, autoscaling implementation model and commonly pairs with API Management and other Azure triggers. Microsoft also provides a migration guide from Lambda to Azure Functions to facilitate parity and code moves.
Amazon DynamoDB streams Fabric mirroring (Azure Cosmos DB) with Fabric eventstreams Amazon DynamoDB streams provide a real-time feed of item-level changes in Amazon DynamoDB tables, which enables event-driven processing and downstream analytics. In Fabric, mirroring Azure Cosmos DB into OneLake for analytics eliminates ETL overhead. Combine Fabric eventstreams with this setup to route real-time events and integrate with Fabric KQL databases or lakehouses.
Amazon DynamoDB streams Azure Cosmos DB change feed These services enable real-time data processing by capturing and providing a stream of data modifications.
Amazon ElastiCache with Redis streams Azure Cache for Redis with Redis streams These services provide managed Redis instances that support Redis streams for real-time data ingestion and processing.
Amazon IoT Analytics Fabric eventstreams with Fabric KQL database Amazon IoT Analytics is a managed service that collects, processes, and analyzes IoT device data at scale. Fabric eventstreams ingest IoT telemetry and route it to the Fabric KQL database for real-time querying and analytics.
AWS IoT Analytics Azure IoT Hub with Stream Analytics These services enable you to process and analyze data from IoT devices in real time. Amazon IoT Analytics provides built-in data storage and analysis capabilities. Azure provides modular services. IoT Hub handles ingestion, and Stream Analytics processes the data.

Machine learning services

The following tools and platforms enable the development, training, and deployment of machine learning models.

AWS service Microsoft service Analysis
Amazon SageMaker Fabric Data Science workload with Machine Learning integration Amazon SageMaker is a managed platform for building, training, and deploying machine learning models at scale. Azure provides an equivalent through Machine Learning, an end-to-end managed service that supports data preparation, automated machine learning, model deployment, and machine learning operations. The Fabric Data Science workload provides model development and enrichment. It integrates with Machine Learning for training, GPU acceleration, and enterprise-grade deployment.
AWS deep learning Amazon machine images (AMIs) Data Science virtual machines (VMs) with Machine Learning AWS deep learning AMIs provide prebuilt VM images with popular deep learning frameworks, GPU drivers, and libraries to accelerate AI model development. Azure provides a similar experience through Data Science VMs, which come preconfigured with Python, R, Jupyter, and deep learning frameworks like TensorFlow and PyTorch. Combine Machine Learning with Data Science VMs to create a managed platform for training, deployment, and machine learning operations.
Amazon SageMaker Autopilot Fabric Data Science workload with Machine Learning integration Amazon SageMaker Autopilot automates the machine learning life cycle by handling data preprocessing, algorithm selection, and hyperparameter tuning with minimal manual effort. The Fabric Data Science workload provides automated machine learning-driven model development and integrates with Machine Learning for training and operationalization.
Amazon SageMaker Autopilot Automated machine learning These services provide automated machine learning for building and training models.
Amazon SageMaker Studio Fabric Data Science workload with Machine Learning integration Amazon SageMaker Studio is an integrated development environment for machine learning in AWS. It provides a single web-based interface to build, train, and deploy models. The Fabric Data Science workload combines collaborative notebooks and Spark-based environments into a unified analytics platform and integrates with Machine Learning for training and deployment.
Amazon SageMaker Studio Azure Machine Learning studio These services provide integrated development environments for machine learning. Amazon SageMaker Studio provides a unified interface for all machine learning development steps, including debugging and profiling tools.

AI services

AI services provide prebuilt, customizable AI capabilities for applications, including vision, speech, language, and decision making capabilities.

AWS service Azure service Analysis
Amazon Rekognition Azure AI Vision with Azure AI Custom Vision Amazon Rekognition is a computer vision service for image and video analysis. It provides object detection, facial recognition, and text extraction. Azure AI Vision delivers prebuilt models for image and video understanding. You can use Custom Vision to train ___domain-specific models with your own data.
Amazon Polly Azure AI Speech text-to-speech Amazon Polly is a text-to-speech service that converts text into lifelike speech by using neural voices across multiple languages. AI Speech text-to-speech provides high-quality neural voices, real-time streaming, and batch synthesis for applications such as voice assistants, interactive voice response (IVR) systems, and accessibility solutions. AI Speech also supports custom neural voice creation to build unique, brand-specific voices while maintaining enterprise-grade security and compliance.
Amazon Transcribe Azure AI Speech speech-to-text Amazon Transcribe provides speech-to-text with real-time transcription and custom vocabularies, commonly used for call analytics and captions. AI Speech speech-to-text provides real-time and batch transcription, speaker diarization, and custom models for ___domain-specific accuracy.
Amazon Translate Azure AI Translator Amazon Translate is a neural machine translation service that delivers translations across multiple languages for websites, apps, and multilingual content. Azure AI Translator provides similar capabilities with real-time and batch translation in more than 100 languages. It also includes features like transliteration, language detection, and custom glossaries for ___domain-specific accuracy.
Amazon Comprehend Azure AI Language Amazon Comprehend is a natural language processing (NLP) service that extracts insights from text, including sentiment, key phrases, and entities. These capabilities help analyze customer feedback and documents. Azure AI Language (text analytics) provides similar capabilities with features like sentiment analysis, key phrase extraction, named entity recognition, and custom text classification.
Amazon Lex Conversational language understanding in Azure AI Foundry These services create conversational interfaces that use natural language understanding. Azure takes a modular approach, where conversational language understanding handles intent recognition and entity extraction. Other components manage dialogue and integration. Amazon Lex provides an integrated solution for building conversational interfaces entirely within the AWS ecosystem.
Amazon Textract Azure AI Document Intelligence Amazon Textract is a machine learning service that extracts text and data from scanned documents, including tables and forms, to automate document processing. Document Intelligence provides similar functionality with optical character recognition (OCR), prebuilt models for invoices, receipts and IDs, and the ability to train custom models for ___domain-specific forms. Document Intelligence supports multi-language extraction and provides layout analysis for complex documents.
Amazon OpenSearch Service AI Search Amazon OpenSearch Service is a managed search and analytics engine based on Elasticsearch, commonly used for log analytics, full-text search, and real-time data exploration. AI Search provides similar capabilities with built-in AI enrichment, hybrid search (keyword with vector), and integration with Azure services for security and compliance. It supports scenarios like semantic search and retrieval-augmented generation (RAG).

Generative AI services

The following AI services create new content or data that resembles human-generated output, like text, images, or audio.

AWS service Azure service Analysis
Amazon Bedrock Azure AI Foundry These services provide foundation models to create and deploy generative AI applications.

Contributors

Microsoft maintains this article. The following contributors wrote this article.

Principal author:

Other contributor:

To see nonpublic LinkedIn profiles, sign in to LinkedIn.

Next steps