Version: 1.0.0

Getting started with Apache Gravitino

There are several options for getting started with Apache Gravitino.

Installing and configuring Hive and Trino can be a little complex. If you are unfamiliar with the technologies, using Docker might be a good choice. There are pre-packaged containers for Gravitino, Apache Hive, Apache Hadoop, Trino, MySQL, PostgreSQL, and others. Check installing Gravitino playground for more details.

This page guides you through the process of downloading and installing Gravitino from source.

Prepare environment
- Deploy and run Gravitino on Amazon Web Service (AWS)
- Deploy and run Gravitino on Google Compute Platform (GCP)
- Run Gravitino on your own machine
Install Gravitino
Start Gravitino
Install Apache Hive
Interact with Apache Gravitino API

note

If you want to access the instance remotely, be sure to read Accessing Gravitino on AWS externally.

Environment preparation

AWS

To work in an AWS environment, follow these steps:

In the AWS console, launch a new instance. Select Ubuntu as the operating system and t2.xlarge as the instance type. Create a key pair named Gravitino.pem for SSH access and download it. Allow HTTP and HTTPS traffic if you want to connect to the instance remotely. Set the Elastic Block Store storage to 20GiB. Leave all other settings at their defaults. Other operating systems and instance types may work but have not been fully tested.
Start the instance and connect to it via SSH using the downloaded .pem file:
```
ssh ubuntu@<IP_address> -i ~/Downloads/Gravitino.pem
```
Note: you may need to adjust the permissions on your .pem file using chmod 400 to enable SSH connections.
Update the Ubuntu OS to ensure it's up-to-date:
```
sudo apt update
sudo apt upgrade
```
You may need to reboot the instance for all changes to take effect.
Install the Java Development Kit (JDK). Java 17 is supported.
```
sudo apt install openjdk-<version>-jdk-headless
```
Verify the Java version with:
```
java -version
```
You should see information about the OpenJDK version.

GCP

To work on the GCP platform, follow these steps:

In the Google Cloud console, launch a new instance. Select e2-standard-4 as the instance type and 20 GB for the boot disk size. Allow HTTP and HTTPS traffic if you want to connect to the instance remotely. Leave all other settings as their defaults. Other operating systems and instance types may work, but are not fully tested.
Start the instance and connect to it via the SSH-in-browser tool.
Update the Debian OS to ensure it's up-to-date:
```
sudo apt update
sudo apt upgrade
```
You may need to reboot the instance for all changes to take effect.

Install the Java Development Kit (JDK), Java 17 is supported.

wget -O - https://apt.corretto.aws/corretto.key | sudo gpg --dearmor -o /usr/share/keyrings/corretto-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/corretto-keyring.gpg] https://apt.corretto.aws stable main" | sudo tee /etc/apt/sources.list.d/corretto.list
sudo apt-get update
sudo apt-get install -y java-<version>-amazon-corretto-jdk

Verify the Java version with:

java -version

You should see information about the OpenJDK version.

Local workstation

To build and install Gravitino locally on a macOS or a Linux workstation, follow these steps:

Install the Java Development Kit (JDK). Java 17 is supported. This can be done using sdkman, for example:
```
sdk install java <version>
```
You can also use different package managers to install JDK, for example, Homebrew on macOS, apt on Ubuntu/Debian, and yum on CentOS/RedHat.

Install Gravitino

You can install Gravitino from the binary release packages or the container images. Follow how-to-install.

Or you can install Gravitino from scratch. Follow how-to-build and how-to-install.

Start Gravitino

Start Gravitino using the gravitino.sh script:

<path-to-gravitino>/bin/gravitino.sh start

Install Apache Hive

If you already have Apache Hive and Apache Hadoop in your environment, you can skip this step and use the existing service with Gravitino. Or else, you can follow the instructions to install Apache Hive.

Interact with Apache Gravitino API

After deploying the Gravitino server, you can interact with it using the RESTful APIs to create and modify metadata.

tip

The following examples use localhost as the host name. You may need to revise it based on your environment.

Create a Metalake:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -d '{"name":"my-metalake","comment":"Test metalake"}' \
  http://localhost:8090/api/metalakes

Verify the MetaLake has been created:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  http://localhost:8090/api/metalakes

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  http://localhost:8090/api/metalakes/my-metalake

Note that if you are requesting a Metalake that doesn't exist, you'll get a NoSuchMetalakeException error.

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  http://localhost:8090/api/metalakes/none

Create a catalog in Hive:

First, list the current catalogs to verify that no catalogs exist.

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  http://localhost:8090/api/metalakes/my-metalake/catalogs

Create a new Hive catalog.

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -d '{"name":"my-catalog","comment":"Test catalog", "type":"RELATIONAL", "provider":"hive", "properties":{"metastore.uris":"thrift://localhost:9083"}}' \
  http://localhost:8090/api/metalakes/my-metalake/catalogs

Verify that the catalog has been created:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  http://localhost:8090/api/metalakes/my-metalake/catalogs

tip

The metastore.uris property used for the Hive catalog has to be adapted to your environment.

Next steps

Delve deeper into the documentation for advanced features and configuration options.
Bookmark Gravitino Website for updates, latest releases, new features, optimizations, and security enhancements.
Read our blogs
Join the Gravitino community forums to connect with developers and other users, for experience sharing and seeking help if needed. Questions and comments are all welcome.
- Join Gravitino Slack channel
- Explore the GitHub repository for issues or pull requests, and pick something you are interested in working on.

Environment preparation​

AWS​

GCP​

Local workstation​

Install Gravitino​

Start Gravitino​

Install Apache Hive​

Interact with Apache Gravitino API​

Next steps​