Mainframe file replication and sync on Azure
Solution ideas
This article describes a solution idea. Your cloud architect can use this guidance to help visualize the major components for a typical implementation of this architecture. Use this article as a starting point to design a well-architected solution that aligns with your workload's specific requirements.
When you migrate an on-premises mainframe or midrange application to Azure, data transfer is a key consideration. Several modernization scenarios require you to replicate files to Azure quickly or to maintain synchronization between on-premises files and Azure files.
This article describes several ways to transfer files to Azure, convert and transform file data, and store the data on-premises and in Azure.
Architecture
Download a Visio file of this architecture.
Dataflow
The following dataflow corresponds to the previous diagram:
Transfer files to Azure:
The easiest way to transfer files on-premises to Azure is by using File Transfer Protocol (FTP). You can host an FTP server on an Azure virtual machine (VM). A simple FTP job control language (JCL) sends files to Azure in binary format, which is essential to preserving mainframe and midrange computation and binary data types. You can store transmitted files in on-premises disks, Azure VM file storage, or Azure Blob Storage.
You can also upload on-premises files to Blob Storage by using tools like AzCopy.
The Azure Data Factory FTP or Secure File Transfer Protocol (SFTP) connector can be used to transfer data from the mainframe system to Blob Storage. This method requires an intermediate VM on which a self-hosted integration runtime is installed.
You can also find non-Microsoft tools on Azure Marketplace to transfer files from mainframes to Azure.
Orchestrate, convert, and transform data:
Azure can't read IBM Extended Binary Coded Decimal Interchange Code (EBCDIC) code page files in Azure VM disks or Blob Storage. To make these files compatible with Azure, Host Integration Server (HIS) converts them from EBCDIC to American Standard Code for Information Interchange (ASCII) format.
Copybooks define the data structure of COBOL, PL/I, and assembly language files. HIS converts these files to ASCII based on the copybook layouts.
Mainframe file data conversion can be achieved by using the Azure Logic Apps connector for IBM host files.
Before you transfer data to Azure data stores, you might need to transform the data or use it for analytics. Azure Data Factory can manage these extract-transform-load (ETL) and extract-load-transform (ELT) activities and store the data directly in Azure Data Lake Storage.
For big data integrations, Azure Databricks and Azure Synapse Analytics can perform all transformation activities fast and effectively by using the Apache Spark engine for in-memory computations.
Store data:
You can store transferred data in one of several available persistent Azure storage modes, depending on your requirements.
If analytics aren't required, Azure Data Factory can store data directly in a wide range of storage options, such as Data Lake Storage and Blob Storage.
Azure hosts various databases that address different needs:
Relational databases include the SQL Server family and open-source databases like PostgreSQL and MySQL.
Nonrelational databases include Azure Cosmos DB, which is a fast, multi-model, globally distributed NoSQL database.
Review analytics and business intelligence. Microsoft Fabric is an all-in-one analytics solution that your organization can use to study data movement, experiment with data sciences, and review real-time analytics and business intelligence. It provides a comprehensive suite of features, including a data lake, data engineering, and data integration.
Components
This architecture uses the following components.
Networking
This architecture uses an on-premises data gateway as bridge software to connect on-premises mainframe data to cloud services. You can install the gateway on a dedicated on-premises VM.
Data integration and transformation
This architecture outlines various Azure-native migration tools that organizations can use based on the mainframe source data and target database.
Data Provider for Host Files is a component of HIS that converts EBCDIC code page files to ASCII. The provider can read and write records offline in a local binary file. Or it can use Systems Network Architecture (SNA) or Transmission Control Protocol/Internet Protocol (TCP/IP) to read and write records in remote IBM z/OS mainframe datasets or i5/OS physical files. HIS connectors are available for BizTalk and Logic Apps.
Azure Data Factory is a hybrid data integration service that you can use to create, schedule, and orchestrate ETL and ELT workflows. In this architecture, Azure Data Factory is used to send mainframe files to Blob Storage via FTP.
Azure Databricks is an Apache Spark-based analytics platform optimized for Azure. You can use Azure Databricks to correlate incoming data, and enrich it with other data stored in Azure Databricks.
Azure Synapse Analytics is a fast and flexible cloud data warehouse with a massively parallel processing architecture that you can use to scale, compute, and store data elastically and independently. It can be used for mainframe data transformation before you load it into an Azure database.
Logic Apps is a cloud-based service that you can use to automate workflows and integrate applications, data, and services across different environments. It provides a native IBM Host File connector that interacts with mainframe systems to read, parse, and generate host file content.
Databases
This architecture outlines the process of migrating mainframe file data to cloud storage and managed databases in Azure. It includes converting mainframe file metadata to match the target schema in Azure.
Azure SQL Database is a scalable relational cloud database service. SQL Database is evergreen and always up-to-date, with AI-powered and automated features that optimize performance and durability. Serverless compute and hyperscale storage options automatically scale resources on demand. With Azure Hybrid Benefit, you can use your existing on-premises SQL Server licenses on the cloud with no extra cost.
Azure SQL Managed Instance combines the broadest SQL Server database engine compatibility with all the benefits of a fully managed and evergreen platform as a service. With SQL Managed Instance, you can modernize your existing apps at scale with familiar tools, skills, and resources.
SQL Server on Azure Virtual Machines lifts and shifts your SQL Server workloads to the cloud to combine the flexibility and hybrid connectivity of Azure with SQL Server performance, security, and analytics. You can access the latest SQL Server updates and releases with total code compatibility.
Azure Database for PostgreSQL is a fully managed relational database service based on the community edition of the open-source PostgreSQL database engine.
Azure Database for MySQL is a fully managed relational database service based on the community edition of the open-source MySQL database engine.
Azure Cosmos DB is a fully managed, multi-model NoSQL database service for building and modernizing scalable, high-performance applications. Azure Cosmos DB scales throughput and storage elastically and independently across geographic regions and guarantees single-digit millisecond response times and 99th percentile availability anywhere in the world.
Other data stores
Blob Storage is a cloud-based object storage solution that stores large amounts of unstructured data, such as text or binary data. You can access this data from anywhere via HTTP or HTTPS. You can use Blob Storage to expose data publicly or to store application data privately.
Data Lake Storage is a storage repository that holds a large amount of data in native, raw format. Data Lake Storage provides scaling for big data analytics workloads with terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources, and can be structured, semi-structured, or unstructured.
Scenario details
Converting mainframe files from EBCDIC-encoded format to ASCII format is necessary for migrating data from mainframe systems to Azure cloud storage and databases. Mainframe applications generate and handle large amounts of data daily. This data must be accurately converted for use in other platforms.
As your organization transitions mainframe file system data, you should transform file metadata into cloud-native schematics. And develop a migration strategy that includes effective file conversion techniques.
Potential use cases
On-premises file replication and synchronization are essential for various use cases:
Downstream or upstream dependencies, like when applications that run on a mainframe and applications that run on Azure need to exchange data via files
Parallel testing of rehosted or re-engineered applications on Azure with on-premises applications
Tightly coupled on-premises applications on systems that can't be immediately remediated or modernized
Contributors
Microsoft maintains this article. The following contributors wrote this article.
Principal authors:
- Nithish Aruldoss | Engineering Architect
- Ashish Khandelwal | Principal Engineering Architecture Manager
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps
- For more information, contact the Microsoft SQL Data Engineering team.
- Azure database migration guides