Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes reliability support in Azure SQL Database, covering intra-regional resiliency via availability zones and multi-region deployments.
Reliability is a shared responsibility between you and Microsoft. You can use this guide to determine which reliability options fulfill your specific business objectives and uptime goals.
Production deployment recommendations
For most production deployments of Azure SQL Database, we recommend that you consider the following minimum configuration:
Follow the guidance in High availability and disaster recovery checklist - Azure SQL Database.
Enable zone redundancy. The following service tiers support zone redundancy:
- In the DTU-based purchasing model, the Premium service tier supports zone redundancy.
- In the vCore-based purchasing model, the General Purpose, Business Critical, and Hyperscale service tiers support zone redundancy.
Configure automated backups and use a minimum of zone-redundant storage. Test your backups and restore process regularly.
Reliability architecture overview
Azure SQL Database runs on the latest stable SQL Server Database Engine of the Windows operating system, including all applicable patches.
By default, Azure SQL Database achieves redundancy by storing three copies of your data in a single datacenter in the primary region. This approach protects your data in the event of a localized failure, such as a small-scale network or power failure, and even during the following events:
- Customer initiated management operations that result in a brief downtime.
- Service maintenance operations.
- Issues and datacenter outages with:
- Racks, where the machines that power your service are running.
- Physical machines that host the VM that runs the SQL Database Engine.
- Other problems with the SQL Database Engine.
- Other potential unplanned localized outages.
Azure SQL Database uses Azure Service Fabric to manage the replication of your database.
Redundancy is implemented in different ways for different service tiers of Azure SQL Database. To learn more about how Azure SQL Database provides redundancy, see Availability through redundancy - Azure SQL Database.
Transient faults
Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.
All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.
Azure SQL Database automatically handles critical servicing tasks, such as patching, backups, Windows, and SQL Database Engine upgrades. It also automatically handles unplanned events such as underlying hardware, software, or network failures. Azure SQL Database can quickly recover even in the most critical circumstances, ensuring that your data is always available. Most users don't notice that upgrades are performed continuously.
When a database is patched or fails over, the downtime isn't impactful if you employ retry logic in your application.
You can test your application's resiliency to transient faults by following the guidance in Test application fault resiliency.
Availability zone support
Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.
You can create a zone-redundant single database or elastic pool. Zone redundancy ensures that your database is resilient to a large set of failures, including catastrophic datacenter outages, without any changes to the application logic.
For the General Purpose service tier, zone redundancy ensures that both the stateless compute components and the stateful data storage components of Azure SQL Database are resilient to an availability zone outage.
For the Premium, Business Critical, and Hyperscale service tiers, zone redundancy places replicas of your Azure SQL Database across multiple Azure availability zones in your primary region. To eliminate a single point of failure, the control ring is also duplicated across multiple availability zones.
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Requirements
The Basic and Standard service tiers don't support zone redundancy.
Zone redundancy is available to databases in the Business Critical, General Purpose and Hyperscale service tiers of the vCore-based purchasing model, and only the Premium service tier of the DTU-based purchasing model.
For the General Purpose service tier:
- Zone-redundant configuration is only available when standard-series (Gen5) hardware is selected.
- When using a zone-redundant Azure SQL Database, only specific regions support custom maintenance windows. For more information, see Azure SQL Database region support for maintenance windows.
For the Premium and Business Critical service tiers:
- When using a zone-redundant Azure SQL Database, only specific regions support custom maintenance windows. For more information, see Azure SQL Database region support for maintenance windows.
For the Hyperscale service tier:
When using a zone-redundant Azure SQL Database, only specific regions support custom maintenance windows. For more information, see Azure SQL Database region support for maintenance windows.
You must enable zone-redundant or geo-zone-redundant backup storage.
Regions supported
For the Premium, General Purpose, and Business Critical service tiers, zone redundancy is available in all Azure regions that support availability zones.
For the Hyperscale service tier, zone redundancy is available in all Azure regions that support availability zones. However, zone redundancy support for Hyperscale premium-series and premium-series memory optimized hardware is available in select Azure regions.
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Considerations
Latency: Because zone-redundant databases have replicas in different datacenters with some distance between them, the increased network latency might increase transaction commit time, and thus impact the performance of some OLTP workloads. Most applications aren't sensitive to this extra latency.
master
database: When a database with a zone-redundant configuration is created on a logical server, themaster
database associated with the server is automatically made zone-redundant as well. For more information, and to check whether yourmaster
database is zone-redundant, see Database zone redundant availability.
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Cost
For the General Purpose service tier, there's an additional charge to enable zone redundancy for Azure SQL Database. For more information, see Pricing - Azure SQL Database.
The Premium and Business Critical service tiers provide multiple replicas of your database. When you enable zone redundancy, the replicas are distributed between availability zones. This means that there's no additional cost associated with enabling zone redundancy on your Azure SQL Database when it's in the Premium or Business Critical service tier.
If you enable multiple replicas of your Hyperscale service tier database, you can enable zone redundancy. When you enable zone redundancy, the replicas are distributed between availability zones. This means that there's no additional cost associated with enabling zone redundancy on your Azure SQL Database when it's in the Hyperscale service tier, assuming you have multiple replicas.
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Configure availability zone support
For the General Purpose, Premium, and Business Critical service tiers:
New resources: You can configure a database to be zone-redundant when you create it. To learn how to create a single Azure Database and configure it with zone redundancy, see Quickstart: Create a single database - Azure SQL Database.
Existing resources: You can reconfigure an existing database to be zone-redundant. To learn how to reconfigure an existing database with zone redundancy, see Enable zone redundancy - Azure SQL Database.
All Azure SQL Database scaling operations, including enabling zone redundancy, are online operations and require minimal to no downtime. For more details, see Dynamically scale database resources with minimal downtime.
Disable zone redundancy: You can disable zone redundancy. This process is an online operation similar to a regular service tier objective upgrade. At the end of the process, the database is migrated from a zone-redundant ring to a single-zone ring.
For the Hyperscale service tier:
New resources: For Hyperscale databases and elastic pools, zone redundancy must be configured when the database is created. For more information, see Create a zone-redundant Hyperscale database.
Migration or disable zone redundancy: To enable or disable zone redundancy on an existing Hyperscale database or elastic pool, you need to redeploy it. The process adds a secondary replica for high availability, and places it into a different availability zone.
To learn how to redeploy an existing Hyperscale database with zone redundancy, see Redeploy a zone-redundant Hyperscale database - Azure SQL Database
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Normal operations
This section describes what to expect when databases are configured for zone redundancy and all availability zones are operational.
For the General Purpose service tier:
Traffic routing between zones. Requests are routed to a node that runs your SQL database compute layer. When zone redundancy is enabled, this node might be located in any availability zone.
Data replication between zones Data and log files are synchronously replicated across availability zones by using zone-redundant storage. Write operations aren't considered complete until the data has been successfully replicated across all of the availability zones. This synchronous replication ensures strong consistency and zero data loss during zone failures. However, it may result in slightly higher write latency compared to locally redundant storage.
For the Premium and Business Critical service tiers:
Traffic routing between zones. Replicas are distributed across availability zones, and one of those replicas is designated as the primary replica. Requests are routed to your database's primary replica.
Data replication between zones. The primary replica constantly pushes changes to the secondary replicas sequentially to ensure that data is persisted on a sufficient number of secondary replicas before committing each transaction. This process guarantees that, if the primary replica or a readable secondary replica become unavailable for any reason, a fully synchronized replica is always available for failover. When zone redundancy is enabled, those replicas are located in different availability zones. However, it may result in slightly higher write latency due to the network latency in traversing zones.
For the Hyperscale service tier:
Traffic routing between zones. Database replicas are distributed across availability zones, and one of those replicas is designated as the primary replica. Requests are routed to your database's primary replica.
Data replication between zones. The primary database replica pushes changes through a zone-redundant log service, which replicates all changes synchronously across availability zones. Page servers are located in each availability zone and store the database's state. This process guarantees that, if the primary replica or a readable secondary replica become unavailable for any reason, a fully synchronized replica is always available for failover. However, it may result in slightly higher write latency compared to locally redundant storage.
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Zone-down experience
This section describes what to expect when databases are configured for zone redundancy and there's an availability zone outage.
Detection and response: Azure SQL Database is responsible for detecting and responding to a failure in an availability zone. You don't need to do anything to initiate a zone failover.
Active requests: When an availability zone goes offline, any requests that are being processed in the faulty availability zone are terminated and must be retried. To make your applications resilient to these types of problems, see transient fault handling guidance.
Traffic rerouting: For the General Purpose service tier, Azure SQL Database moves the database engine to another stateless compute node that's in a different availability zone and has sufficient free capacity. After failover completes, new connections are automatically redirected to the new primary compute node.
For more information, see General Purpose service tier.
Traffic rerouting: For the Premium and Business Critical service tiers, Azure SQL Database selects a replica in another availability zone to become the primary replica. Once a secondary replica becomes the new primary replica, another secondary replica is created to ensure the cluster has a sufficient number of replicas to maintain quorum. After failover completes, new connections are automatically redirected to the new primary replica (or readable secondary replica based on the connection string).
For more information, see Premium and Business Critical service tiers.
Traffic rerouting: For the Hyperscale service tier, if the primary replica was lost due to the zone outage, Azure SQL Database promotes one of the HA replicas in another zone to be the new primary.
For more information, see Hyperscale service tier.
Expected downtime: There might be a small amount of downtime during an availability zone failover. The downtime is typically less than 30 seconds, which your application should tolerate if it's following the transient fault handling guidance.
Expected data loss: There is no data loss expected during an availability zone failover.
To view information about availability zone support for other service tiers, be sure to select the appropriate service tier at the beginning of this page.
Zone recovery
When the availability zone recovers, Azure Service Fabric automatically creates database replicas in the recovered availability zone, removes any temporary replicas created in the other availability zones, and routes traffic between your database as normal. To avoid disruption, the primary replica doesn't automatically return the original zone after the zone recovery.
Testing for zone failures
Azure SQL Database platform manages traffic routing, failover, and zone recovery procedures for zone-redundant databases. Because this feature is fully managed, you don't need to initiate or validate availability zone failure processes. However, you can validate your application's handling of failures and failovers by following the process described in Test application fault resiliency.
Multi-region support
This section provides an overview of two related but separate features that can be used for multi-region geo-replication of Azure SQL Database:
Active geo-replication replicates a single database to a synchronized secondary database.
Failover groups build on top of active geo-replication by providing the ability to fail over a group of databases.
Active geo-replication creates a continuously synchronized readable secondary database (which is sometimes called geo-secondary or geo-replica) in any region for a single primary database. Active geo-replication can create secondary databases in the same region, but this doesn't provide protection against a region outage. When you use active geo-replication to achieve geo-redundancy, you locate the secondary database in a different region to the primary database.
Region support
Active geo-replication can be enabled in all Azure regions, and don't require you to use Azure region pairs.
Tip
Azure SQL Database follows a safe deployment practice where Azure strives not to deploy updates to paired regions at the same time. If you configure active geo-replication to use nonpaired regions, configure different maintenance windows on the servers in each region to reduce the likelihood that both regions have any connectivity issues due to maintenance at the same time.
Requirements
When using active geo-replication, consider the following requirements:
- Both the primary and geo-secondary must have the same service tier and should have the same compute tier, compute size, and backup storage redundancy.
- Both the primary and geo-secondary should have the same IP firewall rules.
Active geo-replication is supported for databases across different Azure subscriptions.
Considerations
Active geo-replication is designed to provide failover of a single database. If you need to fail over multiple databases, consider using failover groups instead.
Because geo-replication is asynchronous, it's possible to have data loss when a failover occurs. If you need to eliminate data loss from asynchronous replication during failovers, you can configure your application to block the calling thread until the last committed transaction has been transmitted and hardened in the transaction log of the secondary database. This approach requires custom development, and it reduces the performance of your application. To learn more, see Prevent loss of critical data.
Review Active geo-replication to understand how this capability works in more detail.
Cost
Secondary databases are billed as separate databases.
If you don't use a secondary database for any read or write workloads, consider whether you can designate it as a standby replica to reduce your costs.
Configure multi-region support
Enable active geo-replication: To learn how to enable active geo-replication in the Azure portal, see Configure active geo-replication for Azure SQL Database or available for other tooling.
After you enable active geo-replication, there's an initial seeding step that can take some time.
Disable active geo-replication: To learn how to disable active geo-replication on a database, see Remove secondary database.
Normal operations
This section describes what to expect when a database is configured to use active geo-replication and all regions are operational.
Traffic routing between regions: Your application must be configured to send read-write requests to the primary database. You can optionally send read-only requests to a secondary database, which helps to reduce the impact of read-only workloads on your primary database.
Data replication between regions: Replication between the primary and secondary databases happens asynchronously, which means there can be a delay between a change being applied to the primary database and when it's replicated to the secondary database.
When you perform a failover, you decide how to handle the possibility of data loss.
Region-down experience
This section describes what to expect when a database is configured to use active geo-replication and there's an outage in your primary region:
Detection and response: You're responsible both for detecting the outage of a database or region and triggering failover.
Active requests: Any active requests during the failover are terminated and must be retried.
Expected data loss: If your primary database is available, you can optionally perform a failover with no data loss. The failover process synchronizes data between the primary and secondary databases before switching roles.
If your primary database isn't available, you might need to perform a forced failover, which will result in data loss. You can estimate the amount of data loss by monitoring the replication lag. For more information, see Monitor Azure SQL Database with metrics and alerts.
Expected downtime: Typically there is up to 60 seconds of downtime during a failover. Ensure that your application is handling transient faults so that it can recover from short periods of downtime. Also, the downtime you experience is affected by how quickly your application is reconfigured to connect to the new primary database.
Traffic rerouting: You're responsible for reconfiguring your application to send requests to the new primary database. If you need to have transparent failover, consider using failover groups.
Region recovery
After the primary region recovers, you can manually perform a failback to the primary region by performing another failover.
Testing for region failures
You can simulate a region outage by triggering a manual failover at any time. You can trigger a failover (no data loss) or a forced failover.
Backups
Take backups of your databases to protect against a variety of risks, including loss of data. Backups can be restored to recover from accidental data loss, corruption, or other issues. Backups are separate to zone redundancy, active geo-replication, or failover groups, and they have different purposes. To learn more, see What are redundancy, replication, and backup?
Azure SQL Database provides automatic backups of your databases. To learn more about the backup frequency, which can affect the amount of data loss if you need to restore from a backup, see Automated backups in Azure SQL Database.
Backup storage
You can choose to store your automated backups in locally redundant or zone-redundant storage. If you use a region that's paired, you can choose to replicate your automated backups to the paired region by using geo-redundant storage. This capability enables geo-restore of your backups into the paired region. For more information, see Automated backups in Azure SQL Database.
If you use a nonpaired region, or if you need to replicate backups to a region other than the paired region, consider exporting the database and storing the exported file in a storage account that uses blob object replication to replicate to a storage account in another region. For more information, see Export a database.
Reliability during service maintenance
When Azure SQL Database performs maintenance on your databases and elastic pools, it might automatically fail over your database to use a secondary replica. Client applications might observe brief connectivity disruptions when a failover occurs, and your client applications should follow the transient fault handling guidance to minimize the effects.
Azure SQL Database enables you to specify a maintenance window that's generally used for service upgrades and other maintenance operations. Configuring a maintenance window can help you to minimize any side effects, like automatic failovers, during your business hours. You can also receive advance notification of planned maintenance.
The gateways used for processing connections to Azure SQL Database are automatically maintained by the platform. Upgrades or maintenance operations can also cause brief connectivity disruptions that clients can retry.
To learn more, see Maintenance window in Azure SQL Database.
Service-level agreement
The service-level agreement (SLA) for Azure SQL Database describes the expected availability of the service, and the expected recovery point and recovery time for active geo-replication. It also describes the conditions that must be met to achieve those expectations. To understand those conditions, it's important that you review the Service Level Agreements (SLA) for Online Services.
When you deploy a zone-redundant database or elastic pool and use a supported service tier, the uptime SLA is higher.