Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This page is an overview of capabilities available when you use Databricks Feature Store with Unity Catalog.
The Databricks Feature Store provides a central registry for features used in your AI and ML models. Feature tables and models are registered in Unity Catalog, providing built-in governance, lineage, and cross-workspace feature sharing and discovery. With Databricks, the entire model training workflow takes place on a single platform, including:
- Data pipelines that ingest raw data, create feature tables, train models, and perform batch inference.
- Model and feature serving endpoints that are available with a single click and that provide milliseconds of latency.
- Data and model monitoring.
When you use features from the feature store to train models, the model automatically tracks lineage to the features that were used in training. At inference time, the model automatically looks up the latest feature values. The feature store also provides on-demand computation of features for real-time applications. The feature store handles all of the feature computation tasks. This eliminates training/serving skew, ensuring that the feature computations used at inference are the same as those used during model training. It also significantly simplifies the client-side code, as all feature lookups and computation are handled by the feature store.
Note
This page covers feature engineering and serving capabilities for workspaces that are enabled for Unity Catalog. If your workspace is not enabled for Unity Catalog, see Workspace Feature Store (legacy).
Conceptual overview
For an overview of how Databricks Feature Store works and a glossary of terms, see Concepts.
Feature engineering
Feature | Description |
---|---|
Work with feature tables in Unity Catalog | Create and work with feature tables. |
Discover and share features
Feature | Description |
---|---|
Explore features in Unity Catalog | Explore and manage feature tables using Catalog Explorer and the Features UI. |
Use tags with feature tables and features in Unity Catalog | Use simple key-value pairs to categorize and manage your feature tables and features. |
Use features in training workflows
Feature | Description |
---|---|
Use features to train models | Use features to train models. |
Point-in-time feature joins | Use point-in-time correctness to create a training dataset that reflects feature values as of the time a label observation was recorded. |
Python API | Python API reference |
Serve features
Feature | Description |
---|---|
Databricks Online Feature Stores | Serve feature data to online applications and real-time machine learning models. Powered by Databricks Lakebase. |
Model Serving with automatic feature lookup | Automatically look up feature values from an online store. |
Feature Serving endpoints | Serve features to models and applications outside of Databricks. |
On-demand feature computation | Calculate feature values at the time of inference. |
Feature governance and lineage
Feature | Description |
---|---|
Feature governance and lineage | Use Unity Catalog to control access to feature tables and view the lineage of a feature table, model, or function. |
Tutorials
Tutorial | Description |
---|---|
Example notebooks to get started | Basic notebook. Shows how to create a feature table, use it to train a model, and run batch scoring using automatic feature lookup. Also shows the Feature Engineering UI to search for features and view lineage. Taxi example notebook. Shows the process of creating features, updating them, and using them for model training and batch inference. |
Example: Deploy and query a feature serving endpoint | Tutorial and example notebook showing how to deploy and query a feature serving endpoint. |
Example: use features with structured RAG applications | Tutorial showing how to use Databricks online tables and feature serving endpoints for retrieval augmented generation (RAG) applications. |
Requirements
- Your workspace must be enabled for Unity Catalog.
- Feature engineering in Unity Catalog requires Databricks Runtime 13.3 LTS or above.
If your workspace does not meet these requirements, see Workspace Feature Store (legacy) for how to use the legacy Workspace Feature Store.
Supported data types
Feature engineering in Unity Catalog and legacy Workspace Feature Store support the following PySpark data types:
IntegerType
FloatType
BooleanType
StringType
DoubleType
LongType
TimestampType
DateType
ShortType
ArrayType
BinaryType
[1]DecimalType
[1]MapType
[1]StructType
[2]
[1] BinaryType
, DecimalType
, and MapType
are supported in all versions of Feature Engineering in Unity Catalog and in Workspace Feature Store v0.3.5 or above.
[2] StructType
is supported in Feature Engineering v0.6.0 or above.
The data types listed above support feature types that are common in machine learning applications. For example:
- You can store dense vectors, tensors, and embeddings as
ArrayType
. - You can store sparse vectors, tensors, and embeddings as
MapType
. - You can store text as
StringType
.
When published to online stores, ArrayType
and MapType
features are stored in JSON format.
The Feature Store UI displays metadata on feature data types:
More information
For more information on best practices, download The Comprehensive Guide to Feature Stores.