Skip to main content
Version: 1.3.0

Connect PyIceberg to Iceberg REST

Introduction

Apache Gravitino exposes an Iceberg REST catalog endpoint that any Iceberg-compatible client can connect to directly. This page describes how to use PyIceberg with Gravitino's Iceberg REST (IRC) endpoint.

Prerequisites

  • Apache Gravitino running with the Iceberg REST service enabled. See Iceberg REST catalog service for setup instructions.
  • The Gravitino IRC endpoint is accessible from your Python environment. The default port is 9001.
  • PyIceberg installed: pip install pyiceberg

Configuration

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
"gravitino_irc",
**{
"type": "rest",
"uri": "http://<gravitino-host>:9001/iceberg",
}
)

Credential Vending

catalog = load_catalog(
"gravitino_irc",
**{
"type": "rest",
"uri": "http://<gravitino-host>:9001/iceberg",
"header.X-Iceberg-Access-Delegation": "vended-credentials",
}
)

OAuth2 Authentication

catalog = load_catalog(
"gravitino_irc",
**{
"type": "rest",
"uri": "http://<gravitino-host>:9001/iceberg",
"token": "<your-token>",
}
)

See How to authenticate for Gravitino authentication configuration options.

Examples

List Namespaces

catalog.list_namespaces()

Load a Table

table = catalog.load_table("db.table")
print(table.schema())

Scan a Table

df = table.scan().to_arrow()
print(df)

Create a Namespace and Table

catalog.create_namespace("db")

from pyiceberg.schema import Schema
from pyiceberg.types import NestedField, LongType, StringType

schema = Schema(
NestedField(1, "id", LongType(), required=True),
NestedField(2, "name", StringType(), required=False),
)
catalog.create_table("db.new_table", schema=schema)

Gravitino Connector vs. Iceberg REST

FeatureGravitino Engine ConnectorIceberg REST
Engine plugin requiredYesNo
Gravitino access controlYesYes
Supported enginesTrino, Spark, Flink, DaftAny Iceberg-compatible engine
Credential vendingVariesYes (S3, GCS, OSS, ADLS)