Apache Hive catalog
Introduction
Apache Gravitino offers the capability to utilize Apache Hive as a catalog for metadata management.
Requirements and limitations
- The Hive catalog requires a Hive Metastore Service (HMS), or a compatible implementation of the HMS, such as AWS Glue.
- Gravitino must have network access to the Hive metastore service using the Thrift protocol.
The Hive catalog supports HMS versions 2.x and 3.x. it can automatically detect the HMS version.
Catalog
Catalog capabilities
The Hive catalog supports creating, updating, and deleting databases and tables in the HMS.
Catalog properties
Besides the common catalog properties, the Hive catalog has the following properties:
| Property Name | Description | Default Value | Required | Since Version |
|---|---|---|---|---|
metastore.uris | The Hive metastore service URIs, separate multiple addresses with commas. Such as thrift://127.0.0.1:9083 | (none) | Yes | 0.2.0 |
client.pool-size | The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.2.0 |
gravitino.bypass. | Property name with this prefix passed down to the underlying HMS client for use. Such as gravitino.bypass.hive.metastore.failure.retries = 3 indicate 3 times of retries upon failure of Thrift metastore calls | (none) | No | 0.2.0 |
client.pool-cache.eviction-interval-ms | The cache pool eviction interval. | 300000 | No | 0.4.0 |
impersonation-enable | Enable user impersonation for Hive catalog. | false | No | 0.4.0 |
kerberos.principal | The Kerberos principal for the catalog. You should configure gravitino.bypass.hadoop.security.authentication, gravitino.bypass.hive.metastore.kerberos.principal and gravitino.bypass.hive.metastore.sasl.enabledif you want to use Kerberos. | (none) | required if you use kerberos | 0.4.0 |
kerberos.keytab-uri | The uri of key tab for the catalog. Now supported protocols are https, http, ftp, file. | (none) | required if you use kerberos | 0.4.0 |
kerberos.check-interval-sec | The interval to check validness of the principal | 60 | No | 0.4.0 |
kerberos.keytab-fetch-timeout-sec | The timeout to fetch key tab | 60 | No | 0.4.0 |
list-all-tables | Lists all tables in a database, including non-Hive tables, such as Iceberg, Hudi, etc. | false | No | 0.5.1 |
default.catalog | The default catalog name for the Hive3 metastore backend; this configuration is ignored when using a Hive2 metastore. | hive | No | 1.1.0 |
For list-all-tables=false, the Hive catalog will filter out:
- Iceberg tables by table property
table_type=ICEBERG - Paimon tables by table property
table_type=PAIMON - Hudi tables by table property
provider=hudi
When you use the Gravitino with Trino. You can pass the Trino Hive connector configuration using prefix trino.bypass.. For example, using trino.bypass.hive.config.resources to pass the hive.config.resources to the Gravitino Hive catalog in Trino runtime.
When you use the Gravitino with Spark. You can pass the Spark Hive connector configuration using prefix spark.bypass.. For example, using spark.bypass.hive.exec.dynamic.partition.mode to pass the hive.exec.dynamic.partition.mode to the Spark Hive connector in Spark runtime.
When you use the Gravitino authorization Hive with Apache Ranger. You can see the Authorization Hive with Ranger properties
Catalog operations
Refer to Manage Relational Metadata Using Gravitino for more details.
Schema
Schema capabilities
The Hive catalog supports creating, updating, and deleting databases in the HMS.
Schema properties
Schema properties supply or set metadata for the underlying Hive database. The following table lists predefined schema properties for the Hive database. Additionally, you can define your own key-value pair properties and transmit them to the underlying Hive database.
| Property name | Description | Default value | Required | Since Version |
|---|---|---|---|---|
location | The directory for Hive database storage, such as /user/hive/warehouse. | HMS uses the value of hive.metastore.warehouse.dir in the hive-site.xml by default. | No | 0.1.0 |
Schema operations
see Manage Relational Metadata Using Gravitino.
Table
Table capabilities
- The Hive catalog supports creating, updating, and deleting tables in the HMS.
- Doesn't support column default value.
Table partitioning
The Hive catalog supports partitioned tables. Users can create partitioned tables in the Hive catalog with the specific partitioning attribute.
Although Gravitino supports several partitioning strategies, Apache Hive inherently only supports a single partitioning strategy (partitioned by column). Therefore, the Hive catalog only supports Identity partitioning.
The fieldName specified in the partitioning attribute must be the name of a column defined in the table.