Share via


Model Serving limits and regions

This article summarizes the limitations and region availability for Mosaic AI Model Serving and supported endpoint types.

Resource and payload limits

Mosaic AI Model Serving imposes default limits to ensure reliable performance. If you have feedback on these limits, reach out to your Databricks account team.

The following table summarizes resource and payload limitations for model serving endpoints.

Feature Granularity Limit
Payload size Per request 16 MB. For endpoints serving foundation models, external models, or AI agents the limit is 4 MB.
Request/response size Per request Any request/response over 1 MB will not be logged.
Queries per second (QPS) Per workspace 200. For higher QPS, enable route optimization.
Model execution duration Per request 297 seconds
CPU endpoint model memory usage Per endpoint 4GB
GPU endpoint model memory usage Per endpoint Greater than or equal to assigned GPU memory, depends on the GPU workload size
Provisioned concurrency Per model and per workspace 200 concurrency. Can be increased by reaching out to your Databricks account team.
Overhead latency Per request Less than 50 milliseconds
Init scripts Init scripts are not supported.
Foundation Model APIs rate limits Per workspace See Foundation Model APIs rate limits and quotas for detailed information about pay-per-token and provisioned throughput limits.

Networking and security limitations

  • Model Serving endpoints are protected by access control and respect networking-related ingress rules configured on the workspace, like IP allowlists and Private Link.
  • Private connectivity (such as Azure Private Link) is only supported for model serving endpoints that use provisioned throughput or endpoints that serve custom models.
  • By default, Model Serving does not support Private Link to external endpoints (like, Azure OpenAI). Support for this functionality is evaluated and implemented on a per-region basis. Reach out to your Azure Databricks account team for more information.
  • Model Serving does not provide security patches to existing model images because of the risk of destabilization to production deployments. A new model image created from a new model version will contain the latest patches. Reach out to your Databricks account team for more information.

Compliance security profile standards: CPU workloads

The following table lists the supported compliance security profile compliance standards for the core Model Serving functionality on CPU workloads.

Note

These compliance standards require served containers to be built in the most recent 30 days. Databricks automatically rebuilds outdated containers on your behalf. However, if this automated job fails, an event log message like the following appears and provides guidance on how to ensure your endpoints stay within compliance requirements:

"Databricks couldn't complete a scheduled compliance check for model $servedModelName. This can happen if the system can't apply a required update. To resolve, try relogging your model. If the issue persists, contact support@databricks.com."

Region Location HIPAA HITRUST PCI-DSS IRAP CCCS Medium (Protected B) UK Cyber Essentials Plus
australiacentral AustraliaCentral            
australiacentral2 AustraliaCentral2            
australiaeast AustraliaEast      
australiasoutheast AustraliaSoutheast            
brazilsouth BrazilSouth      
canadacentral CanadaCentral      
canadaeast CanadaEast            
centralindia CentralIndia      
centralus CentralUS      
chinaeast2 ChinaEast2            
chinaeast3 ChinaEast3            
chinanorth2 ChinaNorth2            
chinanorth3 ChinaNorth3            
eastasia EastAsia      
eastus EastUS      
eastus2 EastUS2      
francecentral FranceCentral      
germanywestcentral GermanyWestCentral      
japaneast JapanEast      
japanwest JapanWest            
koreacentral KoreaCentral      
mexicocentral MexicoCentral            
northcentralus NorthCentralUS      
northeurope NorthEurope      
norwayeast NorwayEast            
qatarcentral QatarCentral            
southafricanorth SouthAfricaNorth            
southcentralus SouthCentralUS      
southeastasia SoutheastAsia      
southindia SouthIndia            
swedencentral SwedenCentral      
switzerlandnorth SwitzerlandNorth      
switzerlandwest SwitzerlandWest            
uaenorth UAENorth      
uksouth UKSouth    
ukwest UKWest            
westcentralus WestCentralUS            
westeurope WestEurope      
westindia WestIndia            
westus WestUS      
westus2 WestUS2      
westus3 WestUS3      

Foundation Model APIs limits

For detailed information about Foundation Model APIs, see:

Region availability

Note

If you require an endpoint in an unsupported region, reach out to your Azure Databricks account team.

If your workspace is deployed in a region that supports model serving but is served by a control plane in an unsupported region, the workspace does not support model serving. If you attempt to use model serving in such a workspace, you will see in an error message stating that your workspace is not supported. Reach out to your Azure Databricks account team for more information.

For more information on regional availability of each Model Serving feature, see Model serving features availability.

For Databricks-hosted foundation model region availability, see Foundation models hosted on Databricks.