Skip to main content
Version: 1.3.0

AWS Glue catalog

Introduction

Apache Gravitino uses AWS Glue Data Catalog as a metadata catalog.

Requirements

  • The Glue catalog requires network access to the AWS Glue API.
  • Gravitino uses the AWS SDK v2 to communicate with Glue.
note

The Glue catalog is case-insensitive for schema and table names. AWS Glue folds database and table names to lowercase on storage.

Catalog

Catalog Capabilities

The Glue catalog supports creating, updating, and deleting databases and tables in the AWS Glue Data Catalog.

  • Supports all table types stored in Glue (Hive, Iceberg, Delta, Parquet, and others) by default.
  • Supports Hive-format table partitioning, bucketing, and sort orders.
  • Does not support views. Glue views (tables with TableType=VIRTUAL_VIEW) are filtered out.

Catalog Properties

Besides the common catalog properties, the Glue catalog has the following properties:

Property NameDescriptionDefault ValueRequiredImmutableSince Version
aws-regionAWS region for the Glue Data Catalog (e.g. us-east-1).(none)YesYes1.3.0
aws-glue-catalog-idThe 12-digit AWS account ID that owns the Glue catalog. When omitted, defaults to the caller's AWS account ID.(none)NoYes1.3.0
aws-access-key-idAWS access key ID for static credential authentication. When omitted, the default credential chain is used.(none)NoNo1.3.0
aws-secret-access-keyAWS secret access key paired with aws-access-key-id. When omitted, the default credential chain is used.(none)NoNo1.3.0
aws-glue-endpointCustom Glue endpoint URL for VPC endpoints or LocalStack testing (e.g. http://localhost:4566).(none)NoNo1.3.0
warehouseBase storage path used as the warehouse when no explicit location is specified at table creation time (e.g. s3://my-bucket/warehouse). Table location is derived as warehouse/database/table.(none)YesNo1.3.0
default-table-formatDefault format for tables created via Gravitino's createTable() API. Accepted values: iceberg, hive.hiveNoNo1.3.0
table-format-filterComma-separated list of table formats exposed by listTables() and loadTable(). Accepted values: all, hive, iceberg, delta, parquet. Use to restrict visible table types.allNoNo1.3.0
note

Authentication priority: Static credentials (aws-access-key-id + aws-secret-access-key) take precedence over the default credential chain (environment variables, instance profile, container credentials).

Catalog Operations

Refer to Manage Relational Metadata Using Gravitino for more details.

note

Sensitive catalog properties such as aws-access-key-id and aws-secret-access-key are hidden from the load catalog response since Gravitino 1.3.0. Use the credential vending API to retrieve them at runtime.

Schema

Schema Capabilities

The Glue catalog supports creating, updating, and deleting databases in the AWS Glue Data Catalog.

Schema Properties

The Glue catalog defines no predefined schema properties beyond comment. Additional key-value properties pass through to the underlying Glue database.

Schema Operations

See Manage Relational Metadata Using Gravitino.

Table

Table Capabilities

  • The Glue catalog supports creating, updating, and deleting tables in the AWS Glue Data Catalog.
  • All entries in the Glue Table.parameters() pass through Gravitino intact, so downstream tools can correctly identify the table format.
  • Does not support column default value.
  • Does not support NOT NULL constraints on columns.
  • Does not support table indexes.

Table Partitioning

The Glue catalog supports partitioned tables. Create partitioned tables in the Glue catalog by specifying the partitioning attribute.

The supported partitioning strategies depend on the table format:

  • Hive-format tables: Only Identity partitioning is supported, because the native Glue partition model is Hive-style key=value.
  • Iceberg-format tables: All Iceberg partition transforms are supported: identity, year, month, day, hour, bucket, and truncate.
caution

The fieldName specified in the partitioning attribute must be the name of a column defined in the table.

Table Sort Orders and Distributions

The Glue catalog supports bucketed sorted tables. Create bucketed sorted tables by setting the distribution and sortOrders attributes. Although Gravitino supports several distribution strategies, AWS Glue inherently only supports a single distribution strategy (clustered by column). Therefore, the Glue catalog only supports Hash distribution.

caution

The fieldName specified in the distribution and sortOrders attribute must be the name of a column defined in the table.

Table Column Types

The Glue catalog supports all data types defined in the Hive Language Manual. The following table lists the data types mapped from the Glue catalog to Gravitino.

Glue Data TypeGravitino Data TypeSince Version
booleanboolean1.3.0
tinyintbyte1.3.0
smallintshort1.3.0
int / integerinteger1.3.0
bigintlong1.3.0
floatfloat1.3.0
doubledouble1.3.0
decimaldecimal1.3.0
stringstring1.3.0
charchar1.3.0
varcharvarchar1.3.0
timestamptimestamp1.3.0
datedate1.3.0
interval_year_monthinterval_year1.3.0
interval_day_timeinterval_day1.3.0
binarybinary1.3.0
arraylist1.3.0
mapmap1.3.0
structstruct1.3.0
uniontypeunion1.3.0
info

Data types not listed above map to Gravitino External Type, which represents an unresolvable data type from the Glue catalog.

Table Properties

The following table lists predefined properties for Glue tables. Additional key-value properties pass through to the underlying Glue database.

note

Reserved: Fields that cannot be passed to the Gravitino server.

Immutable: Fields that cannot be modified once set.

Property NameDescriptionDefault ValueRequiredReservedImmutableSince Version
locationThe location for table storage, such as s3://bucket/prefix/test_table. Derived from warehouse/database/table when not specified.(derived from warehouse)NoNoNo1.3.0
formatThe table file format (parquet, orc, textfile, etc.). When set, input-format, output-format, and serde-lib are derived automatically. Used primarily when creating Hive-format tables via Trino.(none)NoNoYes1.3.0
input-formatThe input format class for the table, such as org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.org.apache.hadoop.mapred.TextInputFormatNoNoYes1.3.0
output-formatThe output format class for the table, such as org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatNoNoYes1.3.0
serde-libThe serde library class for the table, such as org.apache.hadoop.hive.ql.io.orc.OrcSerde.org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeNoNoYes1.3.0
serde-nameThe name of the serde.(none)NoNoNo1.3.0
serde.parameter.The prefix of the serde parameter, such as "serde.parameter.orc.create.index" = "true", indicating ORC serde lib to create row indexes.(none)NoNoNo1.3.0
table-formatTable format stored in Glue Table.parameters(). Use ICEBERG to create an Iceberg table. Common values: ICEBERG, HIVE.(none)NoNoNo1.3.0
metadata_locationIceberg table metadata file location stored in Glue Table.parameters(). When set during createTable(), registers an existing Iceberg table rather than creating a new one.(none)NoNoNo1.3.0
commentUsed to store a table comment.(none)NoYesNo1.3.0
note

All entries in the Glue Table.parameters() pass through Gravitino's API layer intact. This passthrough ensures table_type=ICEBERG, metadata_location=s3://..., spark.sql.sources.provider=delta, and any other format indicators survive Gravitino's metadata proxy layer.

Table Operations

Refer to Manage Relational Metadata Using Gravitino for more details.

Alter Operations

Gravitino defines a unified set of metadata operation interfaces. The following table maps Glue alter operations to Gravitino table update requests.

Alter table
Glue Alter OperationGravitino Table Update RequestSince Version
Alter Table PropertiesSet a table property1.3.0
Alter Table CommentUpdate comment1.3.0
Remove PropertiesRemove a table property1.3.0
caution

Hive-format table rename is not supported. AWS Glue does not provide a native rename API for tables; renaming would require recreating the table. Iceberg-format table rename is supported.

Alter column
Glue Alter OperationGravitino Table Update RequestSince Version
Change Column NameRename a column1.3.0
Change Column TypeUpdate the type of a column1.3.0
Change Column PositionUpdate the position of a column1.3.0
Change Column CommentUpdate the column comment1.3.0
Alter partition

The Glue catalog supports partition operations via SupportsPartitions for Hive-format identity-partitioned tables:

  • listPartitions() / listPartitionNames()
  • getPartition(partitionName)
  • addPartition(partition)
  • dropPartition(partitionName)
caution

Only IdentityPartition is supported because the Glue partition model is Hive-style key=value.

Iceberg Tables

The Glue catalog supports creating and managing Iceberg-format tables through the Apache Iceberg SDK's GlueCatalog. When an Iceberg table is created, Gravitino writes the metadata.json file to S3 and registers the table in Glue with the correct metadata_location parameter, making it usable by Trino (Lakehouse connector), Spark, and other Iceberg-native query engines.

Create an Iceberg Table

Set table-format=ICEBERG in the table properties, or configure default-table-format=iceberg on the catalog to make all tables Iceberg by default.

The warehouse catalog property must be configured. The table location is derived as warehouse/database/table when no explicit location is specified.

Register an Existing Iceberg Table

To register an Iceberg table that already exists in S3, set metadata_location to the path of the existing metadata.json file during createTable(). In this mode, Gravitino registers the table in Glue without creating new metadata.

Iceberg Column Types

Iceberg tables use the Iceberg type system, which differs from Hive types. The following table lists the Gravitino types supported for Iceberg tables and how they map to Iceberg types:

Gravitino Data TypeIceberg Data TypeNotes
booleanboolean
byteintWidened to 32-bit integer
shortintWidened to 32-bit integer
integerint
longlong
floatfloat
doubledouble
decimal(p, s)decimal(p, s)
stringstring
varcharstringIceberg has no variable-length char types
charstringIceberg has no variable-length char types
datedate
time(6)timeOnly microsecond precision (6) is supported
timestamp(6)timestampOnly microsecond precision (6) is supported
timestamptz(6)timestamptzOnly microsecond precision (6) is supported
binarybinary
fixed(n)binaryMapped to variable-length binary
uuiduuid
listlist
mapmap
structstruct

Iceberg Table Alter Operations

For Iceberg tables, the following alter operations are supported:

OperationGravitino Table Update Request
Add columnAdd a column
Delete columnDelete a column
Rename columnRename a column
Update column typeUpdate the type of a column
Update column commentUpdate the column comment
Update column nullabilityUpdate column nullability
Set table propertySet a table property
Remove table propertyRemove a table property
caution

Schema changes and property changes are committed in two separate Iceberg transactions. If the schema commit succeeds but the property commit fails, the table is left in a partially altered state.

caution

Nested column operations (add, delete, rename, type update) are not supported for Iceberg tables via this catalog.

Security

AWS IAM Permissions

The IAM policy attached to the credential used by the Glue catalog must cover both Glue metadata access and S3 data access:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueMetadataAccess",
"Effect": "Allow",
"Action": [
"glue:GetCatalog",
"glue:GetDatabase", "glue:GetDatabases",
"glue:CreateDatabase", "glue:UpdateDatabase", "glue:DeleteDatabase",
"glue:GetTable", "glue:GetTables",
"glue:CreateTable", "glue:UpdateTable", "glue:DeleteTable",
"glue:GetPartition", "glue:GetPartitions",
"glue:CreatePartition", "glue:DeletePartition"
],
"Resource": [
"arn:aws:glue:<region>:<account-id>:catalog",
"arn:aws:glue:<region>:<account-id>:database/*",
"arn:aws:glue:<region>:<account-id>:table/*/*"
]
},
{
"Sid": "S3DataAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<warehouse-bucket>",
"arn:aws:s3:::<warehouse-bucket>/*"
]
}
]
}