Skip to main content
A unified metadata lake across all your sources, formats, cloud providers, and regions in a federated architecture.

WHAT IS GRAVITINO?

Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. By using a technical data catalog and metadata lake, you can manage access and perform data governance for all your data sources (including filestores, relational databases, and event streams) while safely using multiple engines like Spark, Trino, or Flink on multiple formats on different cloud providers.

MAIN FEATURES

Unified Metadata Management

Gravitino abstracts the unified metadata models and APIs for different kinds of metadata sources. For example, relational metadata models for tabular data, like Hive, MySQL, PostgreSQL, etc. File metadata model for all the unstructured data, like HDFS, S3, and other formats.

End-to-End Data Governance

Gravitino aims to provide a unified metadata governance layer to manage end-to-end metadata in a unified way, including access control, auditing, discovery and other features.

Direct Metadata Management

Unlike traditional metadata management systems, which need to collect the metadata actively or passively from underlying systems, Gravitino manages these systems directly. It provides a set of connectors to connect to different metadata sources. The changes in Gravitino directly reflect in the underlying systems, and vice versa.

Geo-Distribution Support

Gravitino supports geo-distribution deployment, which means different instances of Gravitino can deploy in different regions or clouds, and they can connect to get the metadata from each other. With this, users can get a global view of metadata across the regions or clouds.

Multi-Engine Support

Gravitino supports different query engines to access the metadata. Currently, it supports Trino, users can use Trino to query the metadata and data without needing to change the existing SQL dialects. Other query engine support is on the roadmap, including Apache Spark, Apache Flink and others.

AI Asset Management (WIP)

The goal of Gravitino is to unify the data management in both data and AI assets. The support of AI assets like models, features, and others are under development.