Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article describes reliability support in Azure Files, covering intra-regional resiliency via availability zones and multi-region deployments. It also covers cross-region protection through geo-redundant storage (GRS) options. For more information, see Azure reliability.
Reliability is a shared responsibility between you and Microsoft. You can use this guide to determine which reliability options fulfill your specific business objectives and uptime goals.
Azure Files provides fully managed file shares in the cloud that are accessible via industry-standard Server Message Block (SMB) and Network File System (NFS) protocols. Depending on the Azure region, Azure Files can support a range of redundancy configurations to enable both high availability (HA) and disaster recovery (DR) for hosted workloads:
Locally redundant storage (LRS) and zone-redundant storage (ZRS) are designed for HA and ensure data durability within a single datacenter or across availability zones.
GRS and geo-zone-redundant storage (GZRS) provide cross-region DR and replicate data to a secondary region to safeguard against regional outages.
Note
Azure Files is part of the Azure Storage platform. Some of the capabilities of Azure Files are common across many Azure Storage services. In this article, we use Azure Storage to refer to these common capabilities.
Production deployment recommendations
To learn how to deploy Azure Files to support your solution's reliability requirements and how reliability affects other aspects of your architecture, see Architecture best practices for Azure Files in the Azure Well-Architected Framework.
Reliability architecture overview
Azure Files is available in two media tiers:
The Premium tier uses solid-state drives (SSD) for high performance. This tier is recommended for workloads that require low latency.
The Standard tier supports hard disk drives (HDD). HDD file shares provide a cost-effective storage option for general purpose file shares.
For more information, see Plan to deploy Azure Files - Storage tiers.
Azure Files implements redundancy at the storage account level, and file shares inherit that redundancy configuration automatically. The service supports multiple redundancy models that differ in their approach to data protection.
Locally redundant storage (LRS) replicates the data within your storage accounts to one or more Azure availability zones located in the primary region of your choice. Although there's no option to choose your preferred availability zone, Azure may move or expand LRS accounts across zones to improve load balancing. There's no guarantee that your data will be spread across zones. For more information about availability zones, see What are Availability Zones?.
Zone-redundant storage (ZRS), geo-redundant storage (GRS), and geo-zone-redundant storage (GZRS) provide extra protections. This article describes these options in detail.
Transient faults
Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.
All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.
To effectively manage transient faults when you use Azure Files, configure appropriate timeout values for your file operations based on file size and network conditions. Larger files require longer timeouts, while smaller operations can use shorter values to detect failures quickly.
To ensure that only secure connections are established to your NFS share, we recommend that you configure a private endpoint for your storage account. A private endpoint uses Azure Private Link to assign a static IP address to your storage account from within your virtual network's private address space. A private endpoint helps to prevent connectivity interruptions from dynamic IP address changes. For more information about security for your NFS shares, see NFS file shares - Security and networking.
Availability zone support
Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.
Azure Files provides robust availability zone support through ZRS configurations that automatically distribute your data across multiple availability zones within a region. Unlike LRS, ZRS guarantees that Azure synchronously replicates your file data across multiple availability zones. ZRS ensures that your data remains accessible even if one zone experiences an outage.
Region support
ZRS is supported in HDD (standard) file shares in all regions with availability zones.
ZRS is supported for SSD (premium) file shares through the FileStorage
storage account kind. For a list of regions that support ZRS for SSD file share accounts, see ZRS support for SSD file shares.
Requirements
ZRS is supported by all file share types.
Cost
When you enable zone-redundant storage (ZRS), you're charged at a different rate than locally redundant storage (LRS) because of the extra replication and storage overhead.
For detailed pricing information, see Azure Files pricing.
Configure availability zone support
Create a file share with zone redundancy. To create a new file share with ZRS, see Create an Azure file share and select ZRS or GZRS as the redundancy option during account creation.
Change replication type. To convert an existing storage account to ZRS and learn about migration options and requirements, see Change redundancy configuration for Azure Files.
Disable zone redundancy. Convert ZRS accounts back to a nonzonal configuration, such as LRS, through the same redundancy configuration change process.
Normal operations
This section describes what to expect when a file storage account is configured for zone redundancy and all availability zones are operational.
Traffic routing between zones: Azure Storage with zone-redundant storage (ZRS) automatically distributes requests across storage clusters in multiple availability zones. Traffic distribution is transparent to applications and requires no client-side configuration.
Data replication between zones: All write operations to ZRS are replicated synchronously across all availability zones within the region. When you upload or modify data, the operation isn't considered complete until the data has been successfully replicated across all of the availability zones. This synchronous replication ensures strong consistency and zero data loss during zone failures.
Zone-down experience
This section describes what to expect when a file storage account is configured for zone redundancy and there's an availability zone outage.
Detection and response: Microsoft automatically detects zone failures and initiates recovery processes. No customer action is required for zone-redundant storage (ZRS) accounts.
If a zone becomes unavailable, Azure undertakes networking updates such as Domain Name System (DNS) repointing.
Notification: Azure Storage doesn't notify you when a zone is down. However, you can use Azure Resource Health to monitor for the health of your storage account. You can also use Azure Service Health to understand the overall health of the Azure Storage service, including any zone failures.
Set up alerts on these services to receive notifications of zone-level problems. For more information, see Create Service Health alerts in the Azure portal and Create and configure Resource Health alerts.
Active requests: In-flight requests might be dropped during the recovery process and should be retried. Applications should implement retry logic to handle these temporary interruptions.
Expected data loss: No data loss occurs during zone failures because data is synchronously replicated across multiple zones before write operations complete.
Expected downtime: A small amount of downtime, typically, a few seconds, might occur during automatic recovery as traffic is redirected to healthy zones. When you design applications for ZRS, follow practices for transient fault handling, including implementing retry policies with exponential back-off.
- Traffic rerouting: Azure automatically reroutes traffic to the remaining healthy availability zones. The service maintains full functionality by using the surviving zones with no customer intervention required. No remounting of Azure file shares from the connected clients is required.
Zone recovery
When the failed availability zone recovers, Azure Storage automatically restores normal operations across all of the availability zones. The service automatically ensures data consistency by synchronizing any operations that occurred during the outage period.
Testing for zone failures
When you use zone-redundant storage (ZRS), Azure Storage manages replication, traffic routing, and zone-down responses automatically. Because this feature is fully managed, you don't need to initiate or validate availability zone failure processes.
Multi-region support
Azure Storage, including Azure Blob Storage, Azure Files, Azure Table Storage, and Azure Queue Storage, provides a range of geo-redundancy and failover capabilities to suit different requirements.
Important
Geo-redundant storage (GRS) only works within Azure paired regions. If your storage account's region isn't paired, consider using the alternative multi-region approaches.
Replication across paired regions
Azure Storage provides several types of GRS in paired regions. Whichever type of GRS you use, data in the secondary region is always replicated by using locally redundant storage (LRS). This approach provides protection against hardware failures within the secondary region.
GRS provides support for planned and unplanned failovers to the Azure paired region when there's an outage in the primary region. GRS asynchronously replicates data from the primary region to the paired region.
Geo-zone-redundant storage (GZRS) replicates data in multiple availability zones in the primary region and into the paired region.
Important
Azure Files only supports geo-redundancy (GRS or GZRS) for standard (HDD) file shares.
Azure Files doesn't support read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS). If a storage account is configured to use RA-GRS or RA-GZRS, the standard (HDD) file shares are configured and billed as GRS or GZRS.
Failover types
Azure Storage supports three types of failover for different scenarios.
Customer-managed unplanned failover: You're responsible for initiating recovery if there's a region-wide storage failure in your primary region.
Customer-managed planned failover: You are responsible for initiating recovery if another part of your solution has a failure in your primary region, and you need to switch your whole solution over to a secondary region. Use a planned failover when storage remains operational in the primary region, but you need to fail over your whole solution to a secondary region, such as for disaster recovery drills designed to ensure compliance and audit requirements.
Microsoft-managed failover: In exceptional circumstances, Microsoft might initiate failover for all geo-redundant storage (GRS) accounts in a region. However, Microsoft-managed failover is a last resort and is expected to only be performed after an extended period of outage. You shouldn't rely on Microsoft-managed failover.
GRS accounts can use any of these failover types. You don't need to preconfigure a storage account to use any of the failover types ahead of time.
Region support
Azure Storage geo-redundant configurations use Azure paired regions for secondary region replication. The secondary region is automatically determined based on your primary region selection and can't be customized. For a complete list of Azure paired regions, see Azure regions list.
If your storage account's region isn't paired, consider using the alternative multi-region approaches.
Requirements
Standard file shares only: Azure Files only supports geo-redundancy (GRS or GZRS) for standard (HDD) file shares. Premium (SSD) file shares must use LRS or ZRS. If you have premium file shares and you want to replicate the data across regions for higher resiliency, see Alternative multi-region approaches.
GRS and GZRS only: Azure Files doesn't support read-access geo-redundant storage (RA-GRS) or read-access geo-zone-redundant storage (RA-GZRS). If a storage account is configured to use RA-GRS or RA-GZRS, the standard (HDD) file shares are configured and billed as GRS or GZRS.
Considerations
When you implement multi-region Azure Files, consider the following important factors:
Asynchronous replication latency: Data replication to the secondary region is asynchronous, which means that there's a lag between when data is written to the primary region and when it becomes available in the secondary region. This lag can result in potential data loss if a primary region failure occurs before recent data is replicated. The data loss is measured by the recovery point objective (RPO). You can expect the replication lag to be less than 15 minutes, but this time is an estimate and not guaranteed.
You can check the Last Sync Time property to understand how much data might be lost if your storage account has an unplanned failover.
Last Sync Time: For Azure Files, the Last Sync Time is based on the latest system snapshot in the secondary region.
The Last Sync Time calculation can time out if there are more than 100 file shares in a storage account. We recommend that you deploy 100 or fewer file shares for each storage account to avoid timeouts.
Secondary region access: The secondary region isn't accessible for reads until a failover occurs.
Feature limitations: Some Azure Files features aren't supported or have limitations when you use GRS or customer-managed failover. These limitations include specific file share types, access tiers, and management tools and operations. Review feature compatibility documentation before you implement geo-redundancy.
Cost
Multi-region Azure Storage account configurations incur extra costs for cross-region replication and storage in the secondary region. Data transfer between Azure regions is charged based on standard inter-region bandwidth rates.
For detailed pricing information, see Azure Files pricing.
Configure multi-region support
- Create a new geo-redundant storage (GRS) account. To create a GRS account, see Create a storage account and select GRS, read-access geo-redundant storage (RA-GRS), geo-zone-redundant storage (GZRS), or read-access geo-zone-redundant storage (RA-GZRS) during account creation.
Enable geo-redundancy on an existing file storage account. To convert an existing file storage account to GRS, see Change redundancy configuration for Azure Files.
Warning
After your account is reconfigured for geo-redundancy, it might take a significant amount of time before existing data in the new primary region is fully copied to the new secondary region.
To avoid a major data loss, check the value of the Last Sync Time property before you initiate an unplanned failover. To evaluate potential data loss, compare the last sync time to the last time at which data was written to the new primary region.
Disable geo-redundancy. Convert GRS accounts back to single-region configurations (LRS or ZRS) through the same redundancy configuration change process.
Normal operations
This section describes what to expect when a storage account is configured for geo-redundancy and all regions are operational.
Traffic routing between regions: Azure Files uses an active-passive approach where all read and write operations are directed to the primary region.
Data replication between regions: Write operations are first committed to the primary region by using the configured redundancy type (LRS for GRS, or ZRS for GZRS). After successful completion in the primary region, data is asynchronously replicated to the secondary region, where it's stored by using LRS.
The asynchronous nature of cross-region replication means that there's typically a lag time between when data is written to the primary region and when it's available in the secondary region. You can monitor the replication time by using the Last Sync Time property.
Region-down experience
This section describes what to expect when a storage account is configured for geo-redundancy and there's an outage in the primary region.
Customer-managed failover (unplanned): Use an unplanned failover when storage in the primary region is unavailable.
Detection and response: In the unlikely event that your storage account is unavailable in your primary region, you can consider initiating a customer-managed unplanned failover. To make this decision, consider the following factors:
Whether Azure Resource Health shows problems accessing the storage account in your primary region
Whether Microsoft advises you to perform failover to another region
Warning
An unplanned failover can result in data loss. Before you initiate a customer-managed failover, decide whether the restoration of service justifies the risk of data loss.
Notification: Azure Storage doesn't notify you when a region is down. However, you can use Azure Resource Health to monitor for the health of your storage account. You can also use Azure Service Health to understand the overall health of the Azure Storage service, including any region failures.
Set up alerts on these services to receive notifications of region-level problems. For more information, see Create Service Health alerts in the Azure portal and Create and configure Resource Health alerts.
Active requests: During the failover process, both the primary and secondary storage account endpoints become temporarily unavailable for both reads and writes. Any active requests might be dropped, and client applications need to retry after the failover completes.
Expected data loss: Data loss is common during an unplanned failover because of the asynchronous replication lag, which means that recent writes might not be replicated. You can check the Last Sync Time property to understand how much data might be lost during an unplanned failover. Expected data loss is often referred to as the recovery point objective (RPO). You can typically expect the RPO to be less than 15 minutes, but that time isn't guaranteed.
Expected downtime: The amount of expected downtime is often referred to as the recovery time objective (RTO). Customer-managed failover typically completes within 60 minutes, depending on the account size and complexity.
Traffic rerouting: As the failover completes, Azure automatically updates the storage account endpoints so that applications don't need to be reconfigured. If your application keeps Domain Name System (DNS) entries cached, it might be necessary to clear the cache to ensure that the application sends traffic to the new primary region.
Post-failover configuration: After an unplanned failover completes, your storage account in the destination region uses the locally redundant storage (LRS) tier. If you need to geo-replicate it again, you need to re-enable geo-redundant storage (GRS) and wait for the data to be replicated to the new secondary region.
For more information about how to initiate customer-managed failover, see How customer-managed (unplanned) failover works and Initiate a storage account failover.
Customer-managed failover (planned): Use a planned failover when storage remains operational in the primary region, but you need to fail over your whole solution to a secondary region for another reason. For example, another Azure service might be experiencing a problem and you need to switch to using a secondary region for your whole solution. Or you might use a planned failover to conduct a disaster recovery (DR) drill for compliance and audit purposes.
Detection and response: You're responsible for deciding to fail over. You typically make this decision if you need to fail over between regions, even though your storage account is healthy. For example, you might trigger a failover when there's a major outage of another application component that you can't recover from in the primary region.
Notification: Azure Storage doesn't notify you when a region is down. However, you can use Azure Resource Health to monitor for the health of your storage account. You can also use Azure Service Health to understand the overall health of the Azure Storage service, including any region failures.
Set up alerts on these services to receive notifications of region-level problems. For more information, see Create Service Health alerts in the Azure portal and Create and configure Resource Health alerts.
Active requests: During the failover process, both the primary and secondary storage account endpoints become temporarily unavailable for both reads and writes. Any active requests might be dropped, and client applications need to retry after the failover completes.
Expected data loss: No data loss is expected because the failover process completes only after all data is synchronized, which results in an RPO of zero.
Expected downtime: Failover typically completes within 60 minutes, which means that the expected RTO is 60 minutes, depending on account size and complexity. During the failover process, both the primary and secondary storage account endpoints become temporarily unavailable for both reads and writes.
Traffic rerouting: As the failover completes, Azure automatically updates the storage account endpoints so that applications don't need to be reconfigured. If your application keeps DNS entries cached, it might be necessary to clear the cache to ensure that the application sends traffic to the new primary region.
Post-failover configuration: After a planned failover completes, your storage account in the destination region continues to be geo-replicated and remains on the GRS tier.
For more information about how to initiate customer-managed failover, see How customer-managed (planned) failover works and Initiate a storage account failover.
Microsoft-managed failover: In the rare event of a major disaster where Microsoft determines that the primary region is permanently unrecoverable, an automatic failover to the secondary region might be initiated. Microsoft handles the entire process and no customer action is required. The amount of time that elapses before failover occurs depends on the severity of the disaster and the time required to assess the situation.
Notification: Azure Storage doesn't notify you when a region is down. However, you can use Azure Resource Health to monitor for the health of your storage account. You can also use Azure Service Health to understand the overall health of the Azure Storage service, including any region failures.
Set up alerts on these services to receive notifications of region-level problems. For more information, see Create Service Health alerts in the Azure portal and Create and configure Resource Health alerts.
Important
Use customer-managed failover options to develop, test, and implement your DR plans. Don't rely on Microsoft-managed failover, which might only be used in extreme circumstances. A Microsoft-managed failover is likely initiated for an entire region. It can't be initiated for individual storage accounts, subscriptions, or customers. Failover might occur at different times for different Azure services. We recommend that you use customer-managed failover.
Region recovery
The failback process differs significantly between Microsoft-managed and customer-managed failover scenarios.
Customer-managed failover (unplanned): After an unplanned failover, the storage account is configured with locally redundant storage (LRS). To fail back, you need to re-establish the geo-redundant storage (GRS) relationship and wait for the data to be replicated.
Customer-managed failover (planned): After a planned failover, the storage account remains geo-replicated. You can initiate another customer-managed failover to fail back to the original primary region. The same failover considerations apply.
Microsoft-managed failover: If Microsoft initiates a failover, it's likely that a significant disaster occurred in the primary region, and the primary region might not be recoverable. Any timelines or recovery plans depend on the extent of the regional disaster and recovery efforts. You should monitor Azure Service Health communications for details.
Testing for region failures
For GRS accounts, you can perform planned failover operations during maintenance windows to test the complete failover and failback process. Planned failover doesn't require data loss, but it does require downtime during both failover and failback.
Alternative multi-region approaches
The cross-region failover capabilities of Azure Storage might be unsuitable because of the following reasons:
Your storage account is in a nonpaired region.
Your business uptime goals aren't satisfied by the recovery time or data loss that the built-in failover options provide.
You need to fail over to a region that isn't your primary region's pair.
You need an active/active configuration across regions.
- You use file share types that don't support geo-redundancy.
This section provides a high-level overview of some approaches to consider. A comprehensive overview of multi-region deployment topologies for Azure Storage is outside the scope of this article.
Consider the following common high-level approaches:
Multiple storage accounts: Azure Files can be deployed across multiple regions by using separate storage accounts in each region. This approach provides flexibility in region selection, the ability to use nonpaired regions, and more granular control over replication timing and data consistency. When you implement multiple storage accounts across regions, you need to configure cross-region data replication, implement load balancing and failover policies, and ensure data consistency across regions.
Application-level replication: Implement custom replication logic by using Azure Data Factory or AzCopy to synchronize data between file shares in different regions. This approach requires custom development and conflict resolution mechanisms.
Use Azure File Sync to replicate files to a file share in another Azure region. You can use Azure File Sync to sync between an SMB Azure file share (cloud endpoint), an on-premises Windows file server, and a mounted file share that runs on a virtual machine (VM) in another Azure region (a DR server endpoint).
This approach requires you to deploy multiple file shares and a VM to coordinate the synchronization process.
If you use this approach for multi-region file replication:
Disable cloud tiering to ensure that all data is present locally on the file server.
Provision enough storage on the Azure VM to hold the entire dataset.
Access and modify files on the server endpoint, and not in Azure, to ensure that changes replicate quickly to the secondary region.
Backups
Azure Files backup is a native integration between Azure Files and Azure Backup that's designed to safeguard data against accidental deletion, corruption, and ransomware attacks.
Azure Files backup creates share-level snapshots stored within the same storage account. This capability enables the rapid recovery of both individual files and entire file shares. You can also use backup policies to provide long retention periods with customizable backup frequency.
You can create your snapshots and store them in two different ways:
Share-level storage: For operational and short-term recovery scenarios, you can create share-level snapshots and store them within the same storage account. Share-level snapshots enable rapid recovery of individual files or entire file shares to either the original or an alternate ___location.
Vaulted backup storage: By using vaulted backup, you can copy your daily snapshots to an Azure Recovery Services vault. To enhance security, this vault is isolated and air-gapped from the primary storage account.
When you use a paired Azure region and configure the vault to use GRS, the vault replicates data to the paired region. This replication supports cross-region recovery and DR workflows.
Service-level agreement
The service-level agreement (SLA) for Azure Storage describes the expected availability of the service and the conditions that must be met to achieve that availability expectation. The availability SLA you're eligible for depends on the storage tier and the replication type that you use. For more information, see SLAs for Online Services.