Version: 1.2.1

ClickHouse catalog

Introduction

Apache Gravitino can manage ClickHouse metadata through a JDBC catalog. This document describes the capabilities and limitations of the ClickHouse catalog, as well as the supported operations and properties for catalogs, schemas, and tables.

caution

ClickHouse catalog is not included in the standard Gravitino server distribution due to the large size of the ClickHouse JDBC driver and potential licensing issues. To use the ClickHouse catalog, you can build from source code, and refer to document how-to-build for more details.

Catalog

Catalog capabilities

Item	Description
Scope	One catalog maps to one ClickHouse instance
Metadata/DDL	Supports JDBC-based metadata management and DDL
Column defaults	Supports column default values
Drivers	Requires user-provided ClickHouse JDBC driver in `${GRAVITINO_HOME}/catalogs/jdbc-clickhouse/libs`, please download the jar from link
Supported version	All the codes are tested by ClickHouse `24.8.14`, newer versions like `25.x` may also work but we did not conduct thorough tests. Report to the community if something does not work as expected.

ClickHouse server and JDBC driver compatibility matrix

ClickHouse version	Supported ClickHouse JDBC driver versions
`24.8.x` (including `24.8.14`)	`0.7.1 ~ 0.8.4`

tip

For other ClickHouse versions (not 24.8.x), the required JDBC driver version may be different. Use your staging validation results and the official ClickHouse documentation as the final reference.

Catalog properties

You can pass any JDBC pool property that Gravitino does not define by adding the gravitino.bypass. prefix (for example gravitino.bypass.maxWaitMillis). See commons-dbcp configuration for details.

When using the JDBC catalog you must provide jdbc-url, jdbc-driver, jdbc-user, and jdbc-password. Common catalog properties are listed here; ClickHouse adds no extra catalog-scoped keys.

Configuration item	Description	Default value	Required	Since Version
`jdbc-url`	JDBC URL, for example `jdbc:clickhouse://localhost:8123`	(none)	Yes	1.2.0
`jdbc-driver`	JDBC driver class, for example `com.clickhouse.jdbc.ClickHouseDriver`	(none)	Yes	1.2.0
`jdbc-user`	JDBC user name	(none)	Yes	1.2.0
`jdbc-password`	JDBC password	(none)	Yes	1.2.0
`jdbc.pool.min-size`	Minimum pool size	`2`	No	1.2.0
`jdbc.pool.max-size`	Maximum pool size	`10`	No	1.2.0
`jdbc.pool.max-wait-ms`	Max wait time for a connection	`30000`	No	1.2.0

Create a ClickHouse catalog

The following example creates a ClickHouse catalog with the required JDBC properties and optional connection pool settings. Note that the jdbc-driver class must be available in the Gravitino classpath (for example by placing the ClickHouse JDBC driver JAR in ${GRAVITINO_HOME}/catalogs/jdbc-clickhouse/libs). Description about some of the properties:

provider: must be jdbc-clickhouse for Gravitino to recognize the catalog as ClickHouse;
type: must be RELATIONAL since ClickHouse is a relational database;

Shell
Java

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "ck",
  "type": "RELATIONAL",
  "comment": "ClickHouse catalog",
  "provider": "jdbc-clickhouse",
  "properties": {
    "jdbc-url": "jdbc:clickhouse://localhost:8123",
    "jdbc-driver": "com.clickhouse.jdbc.ClickHouseDriver",
    "jdbc-user": "default",
    "jdbc-password": "password"
  }
}' http://localhost:8090/api/metalakes/metalake/catalogs

GravitinoClient client = GravitinoClient.builder("http://localhost:8090")
    .withMetalake("metalake")
    .build();

Map<String, String> ckProps = ImmutableMap.<String, String>builder()
    .put("jdbc-url", "jdbc:clickhouse://localhost:8123")
    .put("jdbc-driver", "com.clickhouse.jdbc.ClickHouseDriver")
    .put("jdbc-user", "default")
    .put("jdbc-password", "password")
    .build();

Catalog catalog =
    client.createCatalog("ck", Catalog.Type.RELATIONAL, "jdbc-clickhouse", "ClickHouse catalog", ckProps);

See Manage Relational Metadata Using Gravitino for other catalog operations.

Schema

Schema capabilities

Item	Description
Mapping	Gravitino schema maps to a ClickHouse database
Operations	Create / drop / load / list (ClickHouse supports cascade drop)
Comments	Schema comments supported
Cluster mode	Optional `ON CLUSTER` for creation when `cluster-name` is provided

Schema properties

Property Name	Description	Default Value	Required	Immutable	Since version
`on-cluster`	Use `ON CLUSTER` when creating the database	`false`	No	No	1.2.0
`cluster-name`	Cluster name used with `ON CLUSTER` (must align with table-level cluster settings)	(none)	No	No	1.2.0

warning

Cluster properties only reflect Gravitino-managed schemas. Gravitino embeds the cluster name inside the schema's COMMENT field at creation time (because SHOW CREATE DATABASE does not include ON CLUSTER for standard Atomic databases). Schemas created outside Gravitino will not have this metadata, so on-cluster and cluster-name will be absent when loaded, and DROP SCHEMA will not propagate ON CLUSTER to other cluster nodes.

Create a schema

Shell
Java

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "sales",
  "comment": "Sales database",
  "properties": {
    "on-cluster": "true",
    "cluster-name": "ck_cluster"
  }
}' http://localhost:8090/api/metalakes/metalake/catalogs/ck/schemas

Catalog catalog = client.loadCatalog("ck");
Schema schema = catalog.asTableCatalog()
    .createSchema("sales", "Sales database",
        ImmutableMap.of("on-cluster", "true", "cluster-name", "ck_cluster"));

See Manage Relational Metadata Using Gravitino for more schema operations.

Table

Table capabilities

Area	Details
Mapping	Gravitino table maps to a ClickHouse table
Engines	MergeTree family (`MergeTree` default, `ReplacingMergeTree`, `SummingMergeTree`, `AggregatingMergeTree`, `CollapsingMergeTree`, `VersionedCollapsingMergeTree`, `GraphiteMergeTree`): fully supported, data persists across restarts. Log family (`TinyLog`, `StripeLog`, `Log`): supported, data and table definition persist across restarts. `Null`: supported, table persists, data is always discarded by design. `Set`: supported, table definition persists. `Memory`: ⚠️ table definition persists but data is lost on ClickHouse restart (volatile). Distributed: cluster mode with remote database/table and sharding key. Not directly creatable via Gravitino (`Join`, `Buffer`, `View`, `KeeperMap`, `File`): require parameterized ENGINE clauses or external dependencies not supported by the CREATE TABLE API.
Ordering/Partition	MergeTree-family requires exactly one `ORDER BY` column; only single-column identity `PARTITION BY` is supported on MergeTree engines. Other engines reject `ORDER BY`/`PARTITION BY`.
Indexes	Primary key; data-skipping indexes `DATA_SKIPPING_MINMAX` and `DATA_SKIPPING_BLOOM_FILTER` (fixed granularities).
Distribution	Gravitino enforces `Distributions.NONE`; no custom distribution strategies.
Column defaults	Supported.
Unsupported	Engine change after creation; removing table properties; auto-increment columns.

Table column types

Gravitino Type	ClickHouse Type
`Byte`	`Int8`
`Unsigned Byte`	`UInt8`
`Short`	`Int16`
`Unsigned Short`	`UInt16`
`Integer`	`Int32`
`Unsigned Integer`	`UInt32`
`Long`	`Int64`
`Unsigned Long`	`UInt64`
`Float`	`Float32`
`Double`	`Float64`
`Decimal(p,s)`	`Decimal(p,s)`
`String`/`VarChar`	`String`
`FixedChar(n)`	`FixedString(n)`
`Date`	`Date`
`Timestamp[(p)]`	`DateTime` (precision defaults to `0`)
`BOOLEAN`	`Bool`
`UUID`	`UUID`

Other ClickHouse types are exposed as External Type.

Table properties

note

settings.* keys are passed to the ClickHouse SETTINGS clause verbatim.
The engine value is immutable after creation.

warning

Cluster properties only reflect Gravitino-managed objects. ClickHouse does not persist ON CLUSTER information in SHOW CREATE TABLE or SHOW CREATE DATABASE output for non-Replicated objects. Gravitino works around this by embedding the cluster name in the object's COMMENT field at creation time and reading it back on load/drop.

This means:

Gravitino-created databases and tables: on-cluster and cluster-name properties are accurate.
Databases or tables created outside Gravitino (e.g., via ClickHouse client, migration scripts, or other tools): on-cluster will be false and cluster-name will be absent, regardless of whether the object was actually created ON CLUSTER. Subsequent DROP DATABASE / DROP TABLE operations performed through Gravitino will not include ON CLUSTER, which may leave orphan objects on non-coordinating cluster nodes.

If you need Gravitino to manage an existing cluster database or table, recreate it through the Gravitino API so the cluster metadata is properly embedded.

warning

Memory engine data volatility: Tables created with engine=Memory store data in RAM only. After a ClickHouse server restart the table definition persists (Gravitino's loadTable succeeds), but all data is permanently lost. Gravitino metadata and ClickHouse remain consistent at the schema level, but users are responsible for repopulating data after restarts. Consider using TinyLog, StripeLog, or a MergeTree-family engine if data durability is required.

Property Name	Description	Default Value	Required	Reserved	Immutable	Since version
`engine`	Table engine (for example `MergeTree`, `ReplacingMergeTree`, `Distributed`, `Memory`, etc.)	`MergeTree`	No	No	Yes	1.2.0
`cluster-name`	Cluster name used with `ON CLUSTER` and Distributed engine	(none)	No*	No	No	1.2.0
`on-cluster`	Use `ON CLUSTER` when creating the table	(none)	No	No	No	1.2.0
`cluster-remote-database`	Remote database for `Distributed` engine	(none)	No**	No	No	1.2.0
`cluster-remote-table`	Remote table for `Distributed` engine	(none)	No**	No	No	1.2.0
`cluster-sharding-key`	Sharding key for `Distributed` engine (expression allowed; referenced columns must be non-null integral)	(none)	No**	No	No	1.2.0
`settings.<name>`	ClickHouse engine setting forwarded as `SETTINGS <name>=<value>`	(none)	No	No	No	1.2.0

* Required when on-cluster=true or engine=Distributed.
** Required when engine=Distributed.

Table indexes

PRIMARY_KEY
Data-skipping indexes:
- DATA_SKIPPING_MINMAX (GRANULARITY fixed to 1)
- DATA_SKIPPING_BLOOM_FILTER (GRANULARITY fixed to 3)

Partitioning, sorting, and distribution

ORDER BY: required for MergeTree-family engines and only columns identity are supported;
- Accept format: id, (id, name), (func(id), name), func(id);
- Reject format: (id + 1), (func(id) + 1), etc.
PARTITION BY: single-column identity and some functions are supported only, and only for MergeTree-family engines. For example PARTITION BY created_at or PARTITION BY toYYYYMM(created_at) are supported, but PARTITION BY (created_at + 1) are not supported. In all, the following partitioning expressions are supported:
- Identity: PARTITION BY column_name
- Functions: PARTITION BY toDate(column_name), PARTITION BY toYear(column_name), PARTITION BY toYYYYMM(column_name). Other functions are not supported.
- Not support: PARTITION BY (column_name + 1), PARTITION BY (toYear(column_name) + 1), etc. (Note: ClickHouse itself does support arbitrary partitioning expressions, but Gravitino currently only supports the above patterns for partitioning).
Distribution: fixed to Distributions.NONE. For a Distributed engine table, you can specify the sharding key and remote database/table through table properties to fulfill the same use cases. We will later consider adding more flexible distribution strategies if there is demand.

Create a table

The following example creates a MergeTree table with ORDER BY, partitioning, indexes, comments, and properties including ON CLUSTER. Note that the engine property is required for MergeTree-family tables, and that the cluster properties must align with the schema-level cluster settings if on-cluster=true.

This is a create table statement that would be executed in ClickHouse to create the corresponding table:

CREATE TABLE sales.orders ON CLUSTER ck_cluster (
  order_id Int32,
  user_id Int32,
  amount Decimal(18,2),
  created_at DateTime,
  primary key (order_id),
) ENGINE = MergeTree order BY order_id PARTITION BY created_at;

The same table can be created through the API as follows:

Shell
Java

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
  "name": "orders",
  "comment": "Orders table",
  "columns": [
    {"name": "order_id", "type": "int", "nullable": false},
    {"name": "user_id", "type": "int", "nullable": false},
    {"name": "amount", "type": "decimal(18,2)", "nullable": false},
    {"name": "created_at", "type": "timestamp", "nullable": false}
  ],
  "properties": {
    "engine": "MergeTree",
    "on-cluster": "true",
    "cluster-name": "ck_cluster"
  },
  "sortOrders": [
    {"expression": "order_id", "direction": "ASCENDING"}
  ],
  "partitioning": ["created_at"],
  "indexes": [
    {"indexType": "primary_key", "name": "pk_order", "fieldNames": [["order_id"]]}
  ]
}' http://localhost:8090/api/metalakes/metalake/catalogs/ck/schemas/sales/tables

TableCatalog tableCatalog = client.loadCatalog("ck").asTableCatalog();

Column[] columns = new Column[] {
    Column.of("order_id", Types.IntegerType.get(), "Order ID", false),
    Column.of("user_id", Types.IntegerType.get(), "User ID", false),
    Column.of("amount", Types.DecimalType.of(18, 2), "Amount", false),
    Column.of("created_at", Types.TimestampType.withoutTimeZone(), "Created time", false)
};

Index[] indexes =
    new Index[] {Indexes.of(Index.IndexType.PRIMARY_KEY, "pk_order", new String[][] {{"order_id"}})};

SortOrder[] sortOrders =
    new SortOrder[] {SortOrder.builder("order_id").withDirection(SortDirection.ASCENDING).build()};

Transform[] partitions = new Transform[] {Transforms.identity("created_at")};

tableCatalog.createTable(
    NameIdentifier.of("sales", "orders"),
    columns,
    "Orders table",
    ImmutableMap.of("engine", "MergeTree", "on-cluster", "true", "cluster-name", "ck_cluster"),
    partitions,
    Distributions.NONE,
    indexes,
    sortOrders);

Table operations

Supported:

Create table with engine, ORDER BY, optional partition, indexes, comments, default values, and SETTINGS.
Add column (with nullable flag, default, comment, position).
Rename column.
Update column type/comment/default/position/nullability.
Delete columns (with IF EXISTS support).
Add data-skipping indexes; drop data-skipping indexes. Adding/dropping primary key is not supported.
Update table comment.

Unsupported:

Changing engine after creation.
Removing table properties or arbitrary ALTER TABLE ... SETTINGS.
Auto-increment columns.

See Manage Relational Metadata Using Gravitino for common JDBC semantics.

Introduction​

Catalog​

Catalog capabilities​

ClickHouse server and JDBC driver compatibility matrix​

Catalog properties​

Create a ClickHouse catalog​

Schema​

Schema capabilities​

Schema properties​

Create a schema​

Table​

Table capabilities​

Table column types​

Table properties​

Table indexes​

Partitioning, sorting, and distribution​

Create a table​

Table operations​

Introduction

Catalog

Catalog capabilities

ClickHouse server and JDBC driver compatibility matrix

Catalog properties

Create a ClickHouse catalog

Schema

Schema capabilities

Schema properties

Create a schema

Table

Table capabilities

Table column types

Table properties

Table indexes

Partitioning, sorting, and distribution

Create a table

Table operations