Edit

Share via


Reliability in Azure App Service

Azure App Service is an HTTP-based service for hosting web applications, REST APIs, and mobile back ends. App Service integrates with Microsoft Azure to provide security, load balancing, autoscaling, and automated management for applications. This article describes reliability support in Azure App Service, covering intra-regional resiliency via availability zones and multi-region deployments.

If you are using App Service Environment, see Reliability in Azure App Service Environment for more information about reliability support in that environment.

Reliability is a shared responsibility between you and Microsoft. You can use this guide to determine which reliability options fulfill your specific business objectives and uptime goals.

Production deployment recommendations

To learn about how to deploy Azure App Service to support your solution's reliability requirements, and how reliability affects other aspects of your architecture, see Architecture best practices for Azure App Service (Web Apps) in the Azure Well-Architected Framework.

Reliability architecture overview

When you create an Azure App Service web app, you define which App Service plan the app runs on.

An App Service plan defines a set of compute resources that run your web apps. All web apps must run inside an App Service plan. You can scale an App Service plan to run on multiple virtual machine instances (workers). These instances are the compute resources that run your app code. A single App Service plan can host multiple apps, all running on the same shared set of VM instances.

App Service offers the following redundancy features:

  • Distribution across fault domains: At the platform level - without any configuration from you - Azure automatically distributes your App Service plan’s VM instances across fault domains within the Azure region. This distribution minimizes the risk of localized hardware failures by grouping virtual machines that share a common power source and network switch.

  • Distribution across availability zones: If you enable zone redundancy on a supported App Service plan, Azure also distributes your instances across availability zones within the region, offering higher resiliency in the event of a zone outage. To learn more about zone redundancy, see Availability zone support.

  • Scaling apps: When you configure your App Service plan to run multiple VM instances, then all apps in the plan run on all instances by default. If you configure your plan for autoscaling, then all apps in the plan scale out together, based on the autoscale settings. However, you can customize how many plan instances run a specific app by using per-app scaling.

  • Scale units: Behind the scenes - without any configuration from you - Azure App Service runs on a platform infrastructure called scale units (also known as stamps). A scale unit includes all the components needed to host and run App Service, including compute, storage, networking, and load balancing. Azure manages scale units to ensure balanced workload distribution, perform routine maintenance, and maintain overall platform reliability.

    Certain capabilities might be applied to some scale units and not others. For example, zone redundancy might be supported by some App Service scale units but not by other scale units in the same region.

Transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.

Microsoft-provided SDKs usually handle transient faults. Because you host your own applications on App Service, consider how to avoid causing transient faults:

  • Deploy multiple instances in your plan. App Service performs automated updates and other forms of maintenance on instances in your plan. If an instance becomes unhealthy, the service can automatically replace that instance with a new healthy instance. During the replacement process, there can be a short period when the previous instance is unavailable and a new instance isn't ready to serve traffic. You can mitigate these effects by deploying multiple instances of your App Service plan.

  • Use deployment slots. App Service deployment slots enable zero-downtime deployments of your applications. Use deployment slots to minimize the effect of deployments and configuration changes for your users. Deployment slots also reduce the likelihood that your application restarts. Restarting the application causes a transient fault.

  • Avoid scaling up or scaling down. Scaling up and down require involve changing the CPU, memory, and other resources that are allocated to each instance. Scale-up and scale-down operations can trigger an application restart. Instead of scaling up or scaling down, select a tier and instance size that meet your performance requirements under typical load. You can scale out and scale in by dynamically adding and removing instances to handle changes in traffic volume.

Availability zone support

Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.

For Premium v2-v4 tiers, App Service can be configured as zone redundant, which means that your resources are distributed across multiple availability zones. Distribution across multiple zones helps your production workloads achieve resiliency and reliability. When you configure zone redundancy on App Service plans, all apps that use the plan are made zone redundant.

Region support

Zone-redundant App Service Premium v2-v4 plans can be deployed in any region that supports availability zones.

Requirements

To enable zone-redundancy you must:

  • Use Premium v2-4 plan types.

  • Deploy a minimum of two instances in your plan.

  • Be located on a scale unit that supports availability zones. When you create an App Service plan, the plan is assigned to a scale unit. The scale unit that you're assigned to is based on the resource group that you deploy an App Service plan to. If your scale unit doesn't support availability zones, you need to create a new plan in a new resource group.

    To learn whether or not the scale unit that your App Service plan is on supports zone redundancy, see Check for zone redundancy support for an App Service plan.

Instance distribution across zones

When you create a zone-redundant App Service plan, the instances of your App Service plan are distributed across the availability zones in the region. The distribution is done automatically by Azure to ensure that your apps remain available even if one zone experiences an outage.

Instance distribution in a zone-redundant deployment follows specific rules. These rules remain applicable as the app scales in and scales out:

  • Minimum instances: Your App Service plan must have a minimum of two instances for zone redundancy.

  • Maximum availability zones supported by your plan: Azure determines the number of availability zones that your plan can use, which is referred to as maximumNumberOfZones. To view the number of availability zones that your specific plan is able to use, see Check zone redundancy support for an App Service plan.

  • Instance distribution: When zone redundancy is enabled, plan instances are distributed across multiple availability zones automatically. The distribution is based on the following rules:

    • The instances distribute evenly if you specify a capacity (number of instances) greater than maximumNumberOfZones and the number of instances is divisible by maximumNumberOfZones.
    • Any remaining instances are distributed across the remaining zones.
    • When the App Service platform allocates instances for a zone-redundant App Service plan, it uses best-effort zone balancing that the underlying Azure virtual machine scale sets provide. An App Service plan is balanced if each zone has the same number of VMs or differs by plus one VM or minus one VM from all other zones. For more information, see Zone balancing.
  • Physical zone placement: You can view the physical availability zone that's used for each of your App Service plan instances. For more information, see View physical zones for an App Service plan.

Considerations

For Premium v2-4 plans, during an availability zone outage, some aspects of Azure App Service might be affected, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

When you enable zone redundancy on your App Service Premium v2-4 plan, you also improve your resiliency to updates that the App Service platform rolls out. To learn more, see Reliability during service maintenance.

For App Service plans that aren't configured as zone redundant, the underlying VM instances aren't resilient to availability zone failures. They can experience downtime during an outage in any zone in that region.

Cost

When you use App Service Premium v2-v4 plans, there's no extra cost associated with enabling availability zones as long as you have two or more instances in your App Service plan. You're charged based on your App Service plan SKU, the capacity you specify, and any instances that you scale to based on your autoscale criteria.

If you enable availability zones but specify a capacity of less than two, the platform enforces a minimum instance count of two. The platform charges you for those two instances.

Configure availability zone support

Capacity planning and management

To prepare for availability zone failure, consider over-provisioning the capacity of your App Service plan. Over-provisioning allows the solution to tolerate some degree of capacity loss and continue to function without degraded performance. For more information, see Manage capacity with over-provisioning.

Normal operations

The following section describes what to expect when App Service plans are configured for zone redundancy and all availability zones are operational:

  • Traffic routing between zones: During normal operations, traffic is routed between all of your available App Service plan instances across all availability zones.

  • Data replication between zones: During normal operations, any state stored in your application's file system is stored in zone-redundant storage and synchronously replicated between availability zones.

Zone-down experience

During an availability zone outage, some aspects of Azure App Service might be affected, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

The following section describes what to expect when App Service plans are configured for zone redundancy and one or more availability zones are unavailable:

  • Detection and response: The App Service platform automatically detects failures in an availability zone and initiates a response. No manual intervention is required to initiate a zone failover.

  • Notification: Zone failure events can be monitored through Azure Service Health and Resource Health. Set up alerts on these services to receive notifications of zone-level issues.

  • Active requests: When an availability zone is unavailable, any requests in progress that are connected to an App Service plan instance in the faulty availability zone are terminated. They need to be retried.

  • Traffic rerouting: When a zone is unavailable, App Service detects the lost instances from that zone and automatically attempts to find new replacement instances. Once it finds replacements, it then distributes traffic across the new instances as needed.

    If autoscale is configured and it determines that more instances are needed, it issues a request to App Service to add those instances. Autoscale behavior operates independently of App Service platform behavior, meaning that your instance count specification doesn't need to be a multiple of two. For more information, see Scale up an app in App Service and Autoscale overview.

    Important

    There's no guarantee that requests for more instances in a zone-down scenario succeed. The backfilling of lost instances occurs on a best-effort basis. If you need guaranteed capacity when an availability zone is lost, you should create and configure your App Service plans to account for the loss of a zone. You can achieve this by over-provisioning the capacity of your App Service plan.

  • Nonruntime behaviors: Applications that are deployed in a zone-redundant App Service plan continue to run and serve traffic even if an availability zone experiences an outage. However, nonruntime behaviors might be affected during an availability zone outage. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

Failback

When the availability zone recovers, App Service automatically creates instances in the recovered availability zone, removes any temporary instances created in the other availability zones, and routes traffic between your instances as usual.

Testing for zone failures

The App Service platform manages traffic routing, failover, and failback for zone-redundant App Service plans. Because this feature is fully managed, you don't need to initiate or validate availability zone failure processes.

Multi-region support

App Service is a single-region service. If the region becomes unavailable, your application is also unavailable.

Alternative multi-region approaches

To reduce the risk of a single-region failure affecting your application, you can deploy plans across multiple regions. The following steps help strengthen resilience:

  • Deploy your application to the plans in each region.
  • Configure load balancing and failover policies.
  • Replicate your data across regions so that you can recover your last application state.

The following resources are related to this approach:

Backups

When you use Basic tier or higher, you can back up your App Service apps to a file by using the App Service backup and restore capabilities.

This feature is useful if it's hard to redeploy your code, or if you store state on disk. For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't. For more information, see Back up and restore your app in App Service.

Reliability during service maintenance

Azure App Service performs regular service upgrades, as well as other forms of maintenance. To ensure that your expected capacity is available during an upgrade, the platform automatically adds extra instances of the App Service plan during the upgrade process.

Enable zone redundancy. When you enable zone redundancy on your App Service plan, you also improve your resiliency to updates that the App Service platform rolls out. Update domains consist of collections of virtual machines (VMs) that are taken offline at the time of an update. Update domains are tied to availability zones. Deploying multiple instances in your App Service plan and enabling zone redundancy for your plan adds an extra layer of resiliency during upgrades if an instance or zone becomes unhealthy.

To learn more, see Routine planned maintenance for Azure App Service and Routine maintenance for Azure App Service, restarts, and downtime.

Service-level agreement (SLA)

The service-level agreement (SLA) for Azure services describes the expected availability of each service and the conditions that must be met to achieve that availability expectation. For more information, see SLAs for online services.

When you deploy a zone-redundant App Service plan, the uptime percentage defined in the SLA increases.