Hudi catalog
Introduction
Apache Gravitino provides the ability to manage Apache Hudi metadata.
Requirements and limitations
Tested and verified with Apache Hudi 0.15.0.
Catalog
Catalog capabilities
- Works as a catalog proxy, supporting 
HMSas catalog backend. - Only support read operations (list and load) for Hudi schemas and tables.
 - Doesn't support timeline management operations now.
 
Catalog properties
| Property name | Description | Default value | Required | Since Version | 
|---|---|---|---|---|
catalog-backend | Catalog backend of Gravitino Hudi catalog. Only supports hms now. | (none) | Yes | 0.7.0-incubating | 
uri | The URI associated with the backend. Such as thrift://127.0.0.1:9083 for HMS backend. | (none) | Yes | 0.7.0-incubating | 
client.pool-size | For HMS backend. The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.7.0-incubating | 
client.pool-cache.eviction-interval-ms | For HMS backend. The cache pool eviction interval. | 300000 | No | 0.7.0-incubating | 
gravitino.bypass. | Property name with this prefix passed down to the underlying backend client for use. Such as gravitino.bypass.hive.metastore.failure.retries = 3 indicate 3 times of retries upon failure of Thrift metastore calls for HMS backend. | (none) | No | 0.7.0-incubating | 
Catalog backend security
Users can use the following properties to configure the security of the catalog backend if needed. For example, if you are using a Kerberos Hive catalog backend, you must set authentication.type to Kerberos and provide authentication.kerberos.principal and authentication.kerberos.keytab-uri.
| Property name | Description | Default value | Required | Since Version | 
|---|---|---|---|---|
authentication.type | The type of authentication for hudi catalog backend. This configuration only applicable for for hms backend, and only supports kerberos, simple currently. | simple | No | 1.0.0 | 
authentication.impersonation-enable | Whether to enable impersonation for the hudi catalog | false | No | 1.0.0 | 
authentication.kerberos.principal | The principal of the Kerberos authentication | (none) | required if the value of authentication.type is kerberos. | 1.0.0 | 
authentication.kerberos.keytab-uri | The URI of The keytab for the Kerberos authentication. | (none) | required if the value of authentication.type is kerberos. | 1.0.0 | 
authentication.kerberos.check-interval-sec | The check interval of Kerberos credential for hudi catalog. | 60 | No | 1.0.0 | 
authentication.kerberos.keytab-fetch-timeout-sec | The fetch timeout of retrieving Kerberos keytab from authentication.kerberos.keytab-uri. | 60 | No | 1.0.0 | 
Property name with this prefix passed down to the underlying backend client for use. Such as gravitino.bypass.hive.metastore.kerberos.principal=XXXX、gravitino.bypass.hadoop.security.authentication=kerberos、gravitino.bypass.hive.metastore.sasl.enabled=ture And so on.
Catalog operations
Please refer to Manage Relational Metadata Using Gravitino for more details.
Schema
Schema capabilities
- Only support read operations: listSchema, loadSchema, and schemaExists.
 
Schema properties
- The 
Locationis an optional property that shows the storage path to the Hudi database 
Schema operations
Only support read operations: listSchema, loadSchema, and schemaExists. Please refer to Manage Relational Metadata Using Gravitino for more details.
Table
Table capabilities
- Only support read operations: listTable, loadTable, and tableExists.
 
Table partitions
- Support loading Hudi partitioned tables (Hudi only supports identity partitioning).
 
Table sort orders
- Doesn't support table sort orders.
 
Table distributions
- Doesn't support table distributions.
 
Table indexes
- Doesn't support table indexes.
 
Table properties
- For HMS backend, it will bring out all the table parameters from the HMS.
 
Table column types
The following table shows the mapping between Gravitino and Apache Hudi column types:
| Gravitino Type | Apache Hudi Type | 
|---|---|
boolean | boolean | 
integer | int | 
long | long | 
date | date | 
timestamp | timestamp | 
float | float | 
double | double | 
string | string | 
decimal | decimal | 
binary | bytes | 
array | array | 
map | map | 
struct | struct | 
Table operations
Only support read operations: listTable, loadTable, and tableExists. Please refer to Manage Relational Metadata Using Gravitino for more details.