Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Lakehouse Federation is the query federation platform for Databricks. The term query federation describes a collection of features that enable users and systems to run queries against multiple data sources without needing to migrate all data to a unified system.
There are two types of federation: query federation and catalog federation. This page covers the differences between the types.
Query federation compared to catalog federation
The following table summarizes the key differences between query federation and catalog federation:
Attribute | Query federation | Catalog federation |
---|---|---|
Query path | Unity Catalog queries are pushed down to the foreign database using JDBC. The query is run both in Databricks and using remote compute. | Unity Catalog queries directly access the foreign table in object storage. Catalog federation is available for platforms that support direct access to their catalog and storage services. The query is only run on Databricks compute, meaning that catalog federation is more cost-effective and performance-optimized than query federation. |
Use case |
When your source supports both Lakehouse Federation and Lakeflow Connect, Databricks recommends Lakeflow Connect if performance on higher data volumes and lower latency are priorities. |
|
Overview of steps |
|
|
Supported data sources
Connect to the following sources using query federation:
- MySQL
- PostgreSQL
- Teradata
- Oracle
- Amazon Redshift
- Salesforce Data Cloud
- Snowflake
- Microsoft SQL Server
- Azure Synapse (SQL Data Warehouse)
- Google BigQuery
- Databricks
Connect to the following sources using catalog federation: