Skip to main content
Version: 0.7.0-incubating

Hudi catalog

Introduction

Apache Gravitino provides the ability to manage Apache Hudi metadata.

Requirements and limitations

info

Tested and verified with Apache Hudi 0.15.0.

Catalog

Catalog capabilities

  • Works as a catalog proxy, supporting HMS as catalog backend.
  • Only support read operations (list and load) for Hudi schemas and tables.
  • Doesn't support timeline management operations now.

Catalog properties

Property nameDescriptionDefault valueRequiredSince Version
catalog-backendCatalog backend of Gravitino Hudi catalog. Only supports hms now.(none)Yes0.7.0-incubating
uriThe URI associated with the backend. Such as thrift://127.0.0.1:9083 for HMS backend.(none)Yes0.7.0-incubating
client.pool-sizeFor HMS backend. The maximum number of Hive metastore clients in the pool for Gravitino.1No0.7.0-incubating
client.pool-cache.eviction-interval-msFor HMS backend. The cache pool eviction interval.300000No0.7.0-incubating
gravitino.bypass.Property name with this prefix passed down to the underlying backend client for use. Such as gravitino.bypass.hive.metastore.failure.retries = 3 indicate 3 times of retries upon failure of Thrift metastore calls for HMS backend.(none)No0.7.0-incubating

Catalog operations

Please refer to Manage Relational Metadata Using Gravitino for more details.

Schema

Schema capabilities

  • Only support read operations: listSchema, loadSchema, and schemaExists.

Schema properties

  • The Location is an optional property that shows the storage path to the Hudi database

Schema operations

Only support read operations: listSchema, loadSchema, and schemaExists. Please refer to Manage Relational Metadata Using Gravitino for more details.

Table

Table capabilities

  • Only support read operations: listTable, loadTable, and tableExists.

Table partitions

  • Support loading Hudi partitioned tables (Hudi only supports identity partitioning).

Table sort orders

  • Doesn't support table sort orders.

Table distributions

  • Doesn't support table distributions.

Table indexes

  • Doesn't support table indexes.

Table properties

  • For HMS backend, it will bring out all the table parameters from the HMS.

Table column types

The following table shows the mapping between Gravitino and Apache Hudi column types:

Gravitino TypeApache Hudi Type
booleanboolean
integerint
longlong
datedate
timestamptimestamp
floatfloat
doubledouble
stringstring
decimaldecimal
binarybytes
arrayarray
mapmap
structstruct

Table operations

Only support read operations: listTable, loadTable, and tableExists. Please refer to Manage Relational Metadata Using Gravitino for more details.