Connect PyIceberg to Iceberg REST
Introduction
Apache Gravitino exposes an Iceberg REST catalog endpoint that any Iceberg-compatible client can connect to directly. This page describes how to use PyIceberg with Gravitino's Iceberg REST (IRC) endpoint.
Prerequisites
- Apache Gravitino running with the Iceberg REST service enabled. See Iceberg REST catalog service for setup instructions.
- The Gravitino IRC endpoint is accessible from your Python environment. The default port is
9001. - PyIceberg installed:
pip install pyiceberg
Configuration
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"gravitino_irc",
**{
"type": "rest",
"uri": "http://<gravitino-host>:9001/iceberg",
}
)
Credential Vending
catalog = load_catalog(
"gravitino_irc",
**{
"type": "rest",
"uri": "http://<gravitino-host>:9001/iceberg",
"header.X-Iceberg-Access-Delegation": "vended-credentials",
}
)
OAuth2 Authentication
catalog = load_catalog(
"gravitino_irc",
**{
"type": "rest",
"uri": "http://<gravitino-host>:9001/iceberg",
"token": "<your-token>",
}
)
See How to authenticate for Gravitino authentication configuration options.
Examples
List Namespaces
catalog.list_namespaces()
Load a Table
table = catalog.load_table("db.table")
print(table.schema())
Scan a Table
df = table.scan().to_arrow()
print(df)
Create a Namespace and Table
catalog.create_namespace("db")
from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, LongType, StringType
schema = Schema(
NestedField(1, "id", LongType(), required=True),
NestedField(2, "name", StringType(), required=False),
)
catalog.create_table("db.new_table", schema=schema)
Gravitino Connector vs. Iceberg REST
| Feature | Gravitino Engine Connector | Iceberg REST |
|---|---|---|
| Engine plugin required | Yes | No |
| Gravitino access control | Yes | Yes |
| Supported engines | Trino, Spark, Flink, Daft | Any Iceberg-compatible engine |
| Credential vending | Varies | Yes (S3, GCS, OSS, ADLS) |