Apache Gravitino Trino connector - Hive catalog
The Hive catalog allows Trino querying data stored in an Apache Hive data warehouse.
Requirements
The Hive connector requires a Hive metastore service (HMS), or a compatible implementation of the Hive metastore, such as AWS Glue.
Apache Hadoop HDFS 2.x supported.
Many distributed storage systems including HDFS, Amazon S3 or S3-compatible systems, Google Cloud Storage, Azure Storage, and IBM Cloud Object Storage can be queried with the Hive connector.
The coordinator and all workers must have network access to the Hive metastore and the storage system.
Hive metastore access with the Thrift protocol defaults to using port 9083.
Data files must be in a supported file format. Some file formats can be configured using file format configuration properties per catalog:
- ORC
- Parquet
- Avro
- RCText (RCFile using ColumnarSerDe)
- RCBinary (RCFile using LazyBinaryColumnarSerDe)
- SequenceFile
- JSON (using org.apache.hive.hcatalog.data.JsonSerDe)
- CSV (using org.apache.hadoop.hive.serde2.OpenCSVSerde)
- TextFile