Manage relational metadata using Apache Gravitino
This page introduces how to manage relational metadata by Apache Gravitino, relational metadata refers to relational catalog, schema, table and partitions. Through Gravitino, you can create, edit, and delete relational metadata via unified REST APIs or Java client.
In this document, Gravitino uses Apache Hive catalog as an example to show how to manage relational metadata by Gravitino. Other relational catalogs are similar to Hive catalog, but they may have some differences, especially in catalog property, table property, and column type. For more details, please refer to the related doc.
Assuming:
- Gravitino has just started, and the host and port is http://localhost:8090.
- Metalake has been created.
Catalog operations
Create a catalog
The code below is an example of creating a Hive catalog. For other relational catalogs, the code is similar, but the catalog type, provider, and properties may be different. For more details, please refer to the related doc.
For relational catalog, you must specify the catalog type
as RELATIONAL
when creating a catalog.
You can create a catalog by sending a POST
request to the /api/metalakes/{metalake_name}/catalogs
endpoint or just use the Gravitino Java client. The following is an example of creating a catalog:
- Shell
- Java
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "catalog",
"type": "RELATIONAL",
"comment": "comment",
"provider": "hive",
"properties": {
"metastore.uris": "thrift://localhost:9083"
}
}' http://localhost:8090/api/metalakes/metalake/catalogs
// Assuming you have just created a metalake named `metalake`
GravitinoClient gravitinoClient = GravitinoClient
.builder("http://127.0.0.1:8090")
.withMetalake("metalake")
.build();
Map<String, String> hiveProperties = ImmutableMap.<String, String>builder()
// You should replace the following with your own hive metastore uris that Gravitino can access
.put("metastore.uris", "thrift://localhost:9083")
.build();
Catalog catalog = gravitinoClient.createCatalog("catalog",
Type.RELATIONAL,
"hive", // provider, We support hive, jdbc-mysql, jdbc-postgresql, lakehouse-iceberg, lakehouse-paimon etc.
"This is a hive catalog",
hiveProperties); // Please change the properties according to the value of the provider.
// ...
Currently, Gravitino supports the following catalog providers:
Catalog provider | Catalog property |
---|---|
hive | Hive catalog property |
lakehouse-iceberg | Iceberg catalog property |
lakehouse-paimon | Paimon catalog property |
jdbc-mysql | MySQL catalog property |
jdbc-postgresql | PostgreSQL catalog property |
jdbc-doris | Doris catalog property |
Load a catalog
You can load a catalog by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}
endpoint or just use the Gravitino Java client. The following is an example of loading a catalog:
- Shell
- Java
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" http://localhost:8090/api/metalakes/metalake/catalogs/catalog
// ...
// Assuming you have created a metalake named `metalake` and a catalog named `catalog`
Catalog catalog = gravitinoClient.loadCatalog("catalog");
// ...
Alter a catalog
You can modify a catalog by sending a PUT
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}
endpoint or just use the Gravitino Java client. The following is an example of altering a catalog:
- Shell
- Java
curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"updates": [
{
"@type": "rename",
"newName": "alter_catalog"
},
{
"@type": "setProperty",
"property": "key3",
"value": "value3"
}
]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog
// ...
// Assuming you have created a metalake named `metalake` and a catalog named `catalog`
Catalog catalog = gravitinoClient.alterCatalog("catalog",
CatalogChange.rename("alter_catalog"), CatalogChange.updateComment("new comment"));
// ...
Currently, Gravitino supports the following changes to a catalog:
Supported modification | JSON | Java |
---|---|---|
Rename metalake | {"@type":"rename","newName":"metalake_renamed"} | CatalogChange.rename("catalog_renamed") |
Update comment | {"@type":"updateComment","newComment":"new_comment"} | CatalogChange.updateComment("new_comment") |
Set a property | {"@type":"setProperty","property":"key1","value":"value1"} | CatalogChange.setProperty("key1", "value1") |
Remove a property | {"@type":"removeProperty","property":"key1"} | CatalogChange.removeProperty("key1") |
Most catalog-altering operations are generally safe. However, if you want to change the catalog's URI, you should proceed with caution. Changing the URI may point to a different cluster, rendering the metadata stored in Gravitino unusable. For instance, if the old URI and the new URI point to different clusters that both have a database named db1, changing the URI might cause the old metadata, such as audit information, to be used when accessing db1, which is undesirable.
Therefore, do not change the catalog's URI unless you fully understand the consequences of such a modification.
Drop a catalog
You can remove a catalog by sending a DELETE
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}
endpoint or just use the Gravitino Java client. The following is an example of dropping a catalog:
- Shell
- Java
curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog
// ...
// Assuming you have created a metalake named `metalake` and a catalog named `catalog`
gravitinoClient.dropCatalog("catalog");
// ...
Dropping a catalog only removes metadata about the catalog, schemas, and tables under the catalog in Gravitino, It doesn't remove the real data (table and schema) in Apache Hive.
List all catalogs in a metalake
You can list all catalogs under a metalake by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs
endpoint or just use the Gravitino Java client. The following is an example of listing all the catalogs in
a metalake:
- Shell
- Java
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs
// ...
// Assuming you have just created a metalake named `metalake`
String[] catalogNames = gravitinoClient.listCatalogs();
// ...
List all catalogs' information in a metalake
You can list all catalogs' information under a metalake by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs?details=true
endpoint or just use the Gravitino Java client. The following is an example of listing all the catalogs' information in a metalake:
- Shell
- Java
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs?details=true
// ...
// Assuming you have just created a metalake named `metalake`
Catalog[] catalogsInfos = gravitinoMetaLake.listCatalogsInfo();
// ...
Schema operations
Users should create a metalake and a catalog before creating a schema.
Create a schema
You can create a schema by sending a POST
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas
endpoint or just use the Gravitino Java client. The following is an example of creating a schema:
- Shell
- Java
- Python
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "schema",
"comment": "comment",
"properties": {
"key1": "value1"
}
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas
// Assuming you have just created a Hive catalog named `hive_catalog`
Catalog catalog = gravitinoClient.loadCatalog("hive_catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
Map<String, String> schemaProperties = ImmutableMap.<String, String>builder()
.build();
Schema schema = supportsSchemas.createSchema("schema",
"This is a schema",
schemaProperties
);
// ...
gravitino_client: GravitinoClient = GravitinoClient(uri="http://127.0.0.1:8090", metalake_name="metalake")
catalog: Catalog = gravitino_client.load_catalog(name="hive_catalog")
catalog.as_schemas().create_schema(name="schema",
comment="This is a schema",
properties={})
Currently, Gravitino supports the following schema property:
Catalog provider | Schema property |
---|---|
hive | Hive schema property |
lakehouse-iceberg | Iceberg scheme property |
lakehouse-paimon | Paimon scheme property |
jdbc-mysql | MySQL schema property |
jdbc-postgresql | PostgreSQL schema property |
jdbc-doris | Doris schema property |
Load a schema
You can create a schema by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}
endpoint or just use the Gravitino Java client. The following is an example of loading a schema:
- Shell
- Java
- Python
curl -X GET \-H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema
// ...
// Assuming you have just created a Hive catalog named `hive_catalog`
Catalog catalog = gravitinoClient.loadCatalog("hive_catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
Schema schema = supportsSchemas.loadSchema("schema");
// ...
gravitino_client: GravitinoClient = GravitinoClient(uri="http://127.0.0.1:8090", metalake_name="metalake")
catalog: Catalog = gravitino_client.load_catalog(name="hive_catalog")
schema: Schema = catalog.as_schemas().load_schema(name="schema")
Alter a schema
You can change a schema by sending a PUT
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}
endpoint or just use the Gravitino Java client. The following is an example of modifying a schema:
- Shell
- Java
- Python
curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"updates": [
{
"@type": "removeProperty",
"property": "key2"
}, {
"@type": "setProperty",
"property": "key3",
"value": "value3"
}
]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema
// ...
// Assuming you have just created a Hive catalog named `hive_catalog`
Catalog catalog = gravitinoClient.loadCatalog("hive_catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
Schema schema = supportsSchemas.alterSchema("schema",
SchemaChange.removeProperty("key1"),
SchemaChange.setProperty("key2", "value2"));
// ...
gravitino_client: GravitinoClient = GravitinoClient(uri="http://127.0.0.1:8090", metalake_name="metalake")
catalog: Catalog = gravitino_client.load_catalog(name="hive_catalog")
changes = (
SchemaChange.remove_property("schema_properties_key1"),
SchemaChange.set_property("schema_properties_key2", "schema_properties_new_value"),
)
schema_new: Schema = catalog.as_schemas().alter_schema("schema",
*changes)
Currently, Gravitino supports the following changes to a schema:
Supported modification | JSON | Java |
---|---|---|
Set a property | {"@type":"setProperty","property":"key1","value":"value1"} | SchemaChange.setProperty("key1", "value1") |
Remove a property | {"@type":"removeProperty","property":"key1"} | SchemaChange.removeProperty("key1") |
Drop a schema
You can remove a schema by sending a DELETE
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}
endpoint or just use the Gravitino Java client. The following is an example of dropping a schema:
- Shell
- Java
- Python
// cascade can be true or false
curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema?cascade=true
// ...
// Assuming you have just created a Hive catalog named `hive_catalog`
Catalog catalog = gravitinoClient.loadCatalog("hive_catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
// cascade can be true or false
supportsSchemas.dropSchema("schema", true);
gravitino_client: GravitinoClient = GravitinoClient(uri="http://127.0.0.1:8090", metalake_name="metalake")
catalog: Catalog = gravitino_client.load_catalog(name="hive_catalog")
catalog.as_schemas().drop_schema("schema", cascade=True)
If cascade
is true, Gravitino will drop all tables under the schema. Otherwise, Gravitino will throw an exception if there are tables under the schema.
Some catalogs may not support cascading deletion of a schema, please refer to the related doc for more details.
List all schemas under a catalog
You can list all schemas under a catalog by sending a GET
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas
endpoint or just use the Gravitino Java client. The following is an example of listing all the schemas
in a catalog:
- Shell
- Java
- Python
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas
// ...
// Assuming you have just created a Hive catalog named `hive_catalog`
Catalog catalog = gravitinoClient.loadCatalog("hive_catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
String[] schemas = supportsSchemas.listSchemas();
gravitino_client: GravitinoClient = GravitinoClient(uri="http://127.0.0.1:8090", metalake_name="metalake")
catalog: Catalog = gravitino_client.load_catalog(name="hive_catalog")
schema_list: List[NameIdentifier] = catalog.as_schemas().list_schemas()
Table operations
Users should create a metalake, a catalog and a schema before creating a table.
Create a table
You can create a table by sending a POST
request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/tables
endpoint or just use the Gravitino Java client. The following is an example of creating a table:
- Shell
- Java
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "example_table",
"comment": "This is an example table",
"columns": [
{
"name": "id",
"type": "integer",
"comment": "id column comment",
"nullable": false,
"autoIncrement": true,
"defaultValue": {
"type": "literal",
"dataType": "integer",
"value": "-1"
}
},
{
"name": "name",
"type": "varchar(500)",
"comment": "name column comment",
"nullable": true,
"autoIncrement": false,
"defaultValue": {
"type": "literal",
"dataType": "null",
"value": "null"
}
},
{
"name": "StartingDate",
"type": "timestamp",
"comment": "StartingDate column comment",
"nullable": false,
"autoIncrement": false,
"defaultValue": {
"type": "function",
"funcName": "current_timestamp",
"funcArgs": []
}
},
{
"name": "info",
"type": {
"type": "struct",
"fields": [
{
"name": "position",
"type": "string",
"nullable": true,
"comment": "position field comment"
},
{
"name": "contact",
"type": {
"type": "list",
"elementType": "integer",
"containsNull": false
},
"nullable": true,
"comment": "contact field comment"
},
{
"name": "rating",
"type": {
"type": "map",
"keyType": "string",
"valueType": "integer",
"valueContainsNull": false
},
"nullable": true,
"comment": "rating field comment"
}
]
},
"comment": "info column comment",
"nullable": true
},
{
"name": "dt",
"type": "date",
"comment": "dt column comment",
"nullable": true
}
],
"partitioning": [
{
"strategy": "identity",
"fieldName": [ "dt" ]
}
],
"distribution": {
"strategy": "hash",
"number": 32,
"funcArgs": [
{
"type": "field",
"fieldName": [ "id" ]
}
]
},
"sortOrders": [
{
"sortTerm": {
"type": "field",
"fieldName": [ "age" ]
},
"direction": "asc",
"nullOrdering": "nulls_first"
}
],
"indexes": [
{
"indexType": "primary_key",
"name": "PRIMARY",
"fieldNames": [["id"]]
}
],
"properties": {
"format": "ORC"
}
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables
// Assuming you have just created a Hive catalog named `hive_catalog`
Catalog catalog = gravitinoClient.loadCatalog("hive_catalog");
TableCatalog tableCatalog = catalog.asTableCatalog();
// This is an example of creating a Hive table, you should refer to the related doc to get the
// table properties of other catalogs.
Map<String, String> tablePropertiesMap = ImmutableMap.<String, String>builder()
.put("format", "ORC")
// For more table properties, please refer to the related doc.
.build();
tableCatalog.createTable(
NameIdentifier.of("schema", "example_table"),
new Column[] {
Column.of("id", Types.IntegerType.get(), "id column comment", false, true, Literals.integerLiteral(-1)),
Column.of("name", Types.VarCharType.of(500), "name column comment", true, false, Literals.NULL),
Column.of("StartingDate", Types.TimestampType.withoutTimeZone(), "StartingDate column comment", false, false, Column.DEFAULT_VALUE_OF_CURRENT_TIMESTAMP),
Column.of("info", Types.StructType.of(
Field.nullableField("position", Types.StringType.get(), "Position of the user"),
Field.nullableField("contact", Types.ListType.of(Types.IntegerType.get(), false), "contact field comment"),
Field.nullableField("rating", Types.MapType.of(Types.VarCharType.of(1000), Types.IntegerType.get(), false), "rating field comment")
), "info column comment", true, false, null),
Column.of("dt", Types.DateType.get(), "dt column comment", true, false, null)
},
"This is an example table",
tablePropertiesMap,
new Transform[] {Transforms.identity("id")},
Distributions.of(Strategy.HASH, 32, NamedReference.field("id")),
new SortOrder[] {SortOrders.ascending(NamedReference.field("name"))},
new Index[] {Indexes.of(IndexType.PRIMARY_KEY, "PRIMARY", new String[][]{{"id"}})}
);
The provided example demonstrates table creation but isn't directly executable in Gravitino, since not all catalogs fully support these capabilities.
In order to create a table, you need to provide the following information:
- Table column name and type
- Table column default value (optional)
- Table column auto-increment (optional)
- Table property (optional)
Apache Gravitino table column type
The following types that Gravitino supports:
Type | Java | JSON | Description |
---|---|---|---|
Boolean | Types.BooleanType.get() | boolean | Boolean type |
Byte | Types.ByteType.get() | byte | Byte type, indicates a numerical value of 1 byte |
Byte(false) | Types.ByteType.unsigned() | byte unsigned | Unsigned Byte type, indicates a unsigned numerical value of 1 byte |
Short | Types.ShortType.get() | short | Short type, indicates a numerical value of 2 bytes |
Short(false) | Types.ShortType.unsigned() | short unsigned | Unsigned Short type, indicates a unsigned numerical value of 2 bytes |
Integer | Types.IntegerType.get() | integer | Integer type, indicates a numerical value of 4 bytes |
Integer(false) | Types.IntegerType.unsigned() | integer unsigned | Unsigned Integer type, indicates a unsigned numerical value of 4 bytes |
Long | Types.LongType.get() | long | Long type, indicates a numerical value of 8 bytes |
Long(false) | Types.LongType.unsigned() | long unsigned | Unsigned Long type, indicates a unsigned numerical value of 8 bytes |
Float | Types.FloatType.get() | float | Float type, indicates a single-precision floating point number |
Double | Types.DoubleType.get() | double | Double type, indicates a double-precision floating point number |
Decimal(precision, scale) | Types.DecimalType.of(precision, scale) | decimal(p, s) | Decimal type, indicates a fixed-precision decimal number with the constraint that the precision must be in range [1, 38] and the scala must be in range [0, precision] |
String | Types.StringType.get() | string | String type |
FixedChar(length) | Types.FixedCharType.of(length) | char(l) | Char type, indicates a fixed-length string |
VarChar(length) | Types.VarCharType.of(length) | varchar(l) | Varchar type, indicates a variable-length string, the length is the maximum length of the string |
Timestamp | Types.TimestampType.withoutTimeZone() | timestamp | Timestamp type, indicates a timestamp without timezone |
TimestampWithTimezone | Types.TimestampType.withTimeZone() | timestamp_tz | Timestamp with timezone type, indicates a timestamp with timezone |
Date | Types.DateType.get() | date | Date type |
Time | Types.TimeType.withoutTimeZone() | time | Time type |
IntervalToYearMonth | Types.IntervalYearType.get() | interval_year | Interval type, indicates an interval of year and month |
IntervalToDayTime | Types.IntervalDayType.get() | interval_day | Interval type, indicates an interval of day and time |
Fixed(length) | Types.FixedType.of(length) | fixed(l) | Fixed type, indicates a fixed-length binary array |
Binary | Types.BinaryType.get() | binary | Binary type, indicates a arbitrary-length binary array |
List | Types.ListType.of(elementType, elementNullable) | {"type": "list", "containsNull": JSON Boolean, "elementType": type JSON} | List type, indicate a list of elements with the same type |
Map | Types.MapType.of(keyType, valueType) | {"type": "map", "keyType": type JSON, "valueType": type JSON, "valueContainsNull": JSON Boolean} | Map type, indicate a map of key-value pairs |
Struct | Types.StructType.of([Types.StructType.Field.of(name, type, nullable)]) | {"type": "struct", "fields": [JSON StructField, {"name": string, "type": type JSON, "nullable": JSON Boolean, "comment": string}]} | Struct type, indicate a struct of fields |
Union | Types.UnionType.of([type1, type2, ...]) | {"type": "union", "types": [type JSON, ...]} | Union type, indicates a union of types |
UUID | Types.UUIDType.get() | uuid | UUID type, indicates a universally unique identifier |
The related java doc is here.
External type
External type is a special type of column type, when you need to use a data type that is not in the Gravitino type system, and you explicitly know its string representation in an external catalog (usually used in JDBC catalogs), then you can use the ExternalType to represent the type. Similarly, if the original type is unsolvable, it will be represented by ExternalType. The following shows the data structure of an external type in JSON and Java, enabling easy retrieval of its string value.
- Json
- Java
{
"type": "external",
"catalogString": "user-defined"
}
// The result of the following type is a string "user-defined"
String typeString = ((ExternalType) type).catalogString();
Unparsed type
Unparsed type is a special type of column type, it used to address compatibility issues in type serialization and deserialization between the server and client. For instance, if a new column type is introduced on the Gravitino server that the client does not recognize, it will be treated as an unparsed type on the client side. The following shows the data structure of an unparsed type in JSON and Java, enabling easy retrieval of its value.
- Json
- Java
{
"type": "unparsed",
"unparsedType": "unknown-type"
}
// The result of the following type is a string "unknown-type"
String unparsedValue = ((UnparsedType) type).unparsedType();
Table column default value
When defining a table column, you can specify a literal or an expression as the default value. The default value typically applies to new rows that are inserted into the table by the underlying catalog.
The following is a table of the column default value that Gravitino supports for different catalogs:
Catalog provider | Supported default value |
---|---|
hive | ✘ |
lakehouse-iceberg |