Skip to main content
Version: 1.2.0

Table Maintenance Service (Optimizer)

What is this service

The Table Maintenance Service (Optimizer) automates table maintenance by connecting:

  • Statistics and metrics collection
  • Rule evaluation and strategy recommendation
  • Job template based execution

The CLI commands and configuration keys use the optimizer name.

Alpha status and current limitations

The current Table Maintenance Service is in alpha stage.

Current limitations:

  • It is operated through the optimizer CLI workflow.
  • The built-in maintenance strategy focuses on Iceberg table compaction.
  • Compaction support is currently limited to Iceberg tables with identity partition transforms.

Extensibility and roadmap

Although the built-in capability is intentionally narrow in alpha, the framework is designed for extension:

  • Integrate external systems by implementing custom providers and adapters.
  • Add new strategies and handlers beyond built-in compaction.
  • Plug in custom metrics, evaluators, and job submitters for different environments.

See Optimizer Extension Guide for extension points.

Future versions will continue improving the out-of-the-box experience and evolve toward a more ready-to-use maintenance service.

Architecture overview

The optimizer workflow is based on six parts:

  1. Metadata objects: catalog/schema/table in a metalake.
  2. Statistics and metrics: table/partition signals used for decision making.
  3. Policies: strategy intent, for example system_iceberg_compaction.
  4. Job templates: executable contracts, for example built-in Spark templates.
  5. Job executor: local or custom backend that runs submitted jobs.
  6. Status and logs: REST job state plus local staging logs.

Optimizer architecture and workflow

The following diagram shows the end-to-end interactions between CLI, Gravitino server, Spark jobs, JDBC metrics repository, and the Recommender/Updater/Monitor modules.

Typical data flow:

  1. Collect statistics and metrics for target tables.
  2. Evaluate rules and produce candidate actions.
  3. Submit jobs using a concrete template and jobConf.
  4. Track status and verify results on table metadata and logs.

Execution modes

ModeMain entryBest forOutput
Built-in maintenance workflowGravitino REST + built-in templatesServer-side operational runsSubmitted Spark jobs and updated metadata
Optimizer CLI local calculatorgravitino-optimizer.shLocal file-driven testing and batch scriptsStatistics/metrics updates and optional submissions

Use built-in maintenance workflow when you want policy-driven server execution. Use CLI local calculator when you want to feed JSONL input directly.

Start here

Lifecycle

1. Collect

Generate or ingest table and partition statistics/metrics.

2. Evaluate

Apply policies and rules to decide whether maintenance should run.

3. Submit

Pick a job template and submit job with concrete jobConf.

4. Observe

Check REST job status and validate resulting statistics, metrics, or rewritten data files.

Configuration model

LayerScopeTypical keys
Gravitino server configRuntime for job manager and executorgravitino.job.executor, gravitino.job.statusPullIntervalInMs, gravitino.jobExecutor.local.sparkHome
Job submission jobConfPer job runcatalog_name, table_identifier, spark_*, template-specific args
Optimizer CLI configCLI commandsgravitino.optimizer.* in conf/gravitino-optimizer.conf

Terminology mapping

TermExample valueUsed in
Policy nameiceberg_compaction_defaultPolicy identity and CLI --strategy-name
Policy typesystem_iceberg_compactionREST policy creation field policyType
Strategy typeiceberg-data-compactionPolicy content field strategy.type and strategy handler config key

For strategy submission, --strategy-name must use policy name, not policy type or strategy type.

Prerequisites and verification

Quick start prerequisites and success checks are documented in Optimizer Quick Start and Verification.