Skip to main content
Version: 1.2.0

Optimizer CLI Reference

Use --help to list all commands, or --help --type <command> for command-specific help.

By default, optimizer CLI loads conf/gravitino-optimizer.conf from the current working directory. Use --conf-path only when you need a custom config file.

Command quick reference

Command (--type)Required optionsOptional optionsPurpose
submit-strategy-jobs--identifiers, --strategy-name--dry-run, --limitRecommend and optionally submit jobs
update-statistics--calculator-name--identifiers, --statistics-payload, --file-pathCalculate and persist statistics
append-metrics--calculator-name--identifiers, --statistics-payload, --file-pathCalculate and append metrics
monitor-metrics--identifiers, --action-time--range-seconds, --partition-pathEvaluate rules with before/after metrics
list-table-metrics--identifiers--partition-pathQuery stored table or partition metrics
list-job-metrics--identifiersNoneQuery stored job metrics
submit-update-stats-job--identifiers--dry-run, --update-mode, --updater-options, --spark-confSubmit built-in Iceberg update stats/metrics Spark jobs

Option field meanings

OptionMeaningUsed by
--identifiersComma-separated identifiers. Table format supports catalog.schema.table (or schema.table when default catalog is configured).Most commands
--strategy-namePolicy name to evaluate, for example iceberg_compaction_default.submit-strategy-jobs
--dry-runPreview mode. Prints recommendations or job configs without submitting jobs.submit-strategy-jobs, submit-update-stats-job
--limitMaximum number of strategy jobs to process. Must be > 0.submit-strategy-jobs
--calculator-nameStatistics/metrics calculator implementation name (for example local-stats-calculator).update-statistics, append-metrics
--statistics-payloadInline JSON Lines content as input. Mutually exclusive with --file-path.update-statistics, append-metrics
--file-pathPath to JSON Lines input file. Mutually exclusive with --statistics-payload.update-statistics, append-metrics
--action-timeAction timestamp in epoch seconds used as evaluation anchor.monitor-metrics
--range-secondsTime window (seconds) for monitor evaluation. Default is 86400 (24h).monitor-metrics
--partition-pathPartition path JSON array, for example '[{"dt":"2026-01-01"}]'. Requires exactly one identifier.monitor-metrics, list-table-metrics
--update-modeControls what built-in update job updates: stats, metrics, or all (default).submit-update-stats-job
--updater-optionsFlat JSON map passed to updater logic. For stats/all, include gravitino_uri and metalake.submit-update-stats-job
--spark-confFlat JSON map of Spark and Iceberg catalog configs used by the job.submit-update-stats-job

Global option:

  • --conf-path: Optional custom config file path. If omitted, CLI uses conf/gravitino-optimizer.conf.

Input format for local-stats-calculator

local-stats-calculator reads JSON Lines (one JSON object per line).

Reserved fields

  • stats-type: table, partition, or job
  • identifier: object identifier
  • partition-path: only for partition data, for example {"dt":"2026-01-01"}
  • timestamp: optional epoch seconds (record-level default timestamp for metric points)

All other fields are treated as metric or statistic values.

Supported examples by scope

Use JSON Lines (one JSON object per line). The following examples focus on table, partition, and job scopes with multiple metric/statistic fields:

{"stats-type":"table","identifier":"catalog.db.t1","timestamp":1735689600,"row_count":100}
{"stats-type":"table","identifier":"catalog.db.t1","row_count":100,"total_file_size":1048576}
{"stats-type":"table","identifier":"catalog.db.t1","timestamp":1735689660,"row_count":120,"file_count":24,"avg_file_size":10485.76}
{"stats-type":"partition","identifier":"catalog.db.t1","timestamp":1735689720,"partition-path":{"dt":"2026-01-01"},"row_count":20}
{"stats-type":"partition","identifier":"catalog.db.t1","partition-path":{"dt":"2026-01-01","region":"us"},"row_count":12,"file_count":3}
{"stats-type":"job","identifier":"job-1","timestamp":1735689800,"duration_ms":12500,"rewritten_files":18}

Identifier rules

  • Table and partition records: catalog.schema.table
  • If gravitino.optimizer.gravitinoDefaultCatalog is set, schema.table is also accepted
  • Job records: parsed as a regular Gravitino NameIdentifier

CLI workflow examples

Update statistics in batch

Calculate and persist table or partition statistics from JSONL input.

./bin/gravitino-optimizer.sh \
--type update-statistics \
--calculator-name local-stats-calculator \
--file-path ./table-stats.jsonl

Append metrics in batch

Calculate and append table or job metrics from JSONL input.

./bin/gravitino-optimizer.sh \
--type append-metrics \
--calculator-name local-stats-calculator \
--file-path ./table-stats.jsonl

Dry-run strategy submission

Preview recommendations without actually submitting jobs.

./bin/gravitino-optimizer.sh \
--type submit-strategy-jobs \
--identifiers rest_catalog.db.t1 \
--strategy-name iceberg_compaction_default \
--dry-run \
--limit 10

Submit strategy jobs

Submit jobs for identifiers that match the given policy name.

./bin/gravitino-optimizer.sh \
--type submit-strategy-jobs \
--identifiers rest_catalog.db.t1 \
--strategy-name iceberg_compaction_default \
--limit 10

Monitor metrics

Evaluate monitor rules around an action time.

./bin/gravitino-optimizer.sh \
--type monitor-metrics \
--identifiers catalog.db.sales \
--action-time 1735689600 \
--range-seconds 86400

You can configure evaluator rules in gravitino-optimizer.conf:

gravitino.optimizer.monitor.gravitinoMetricsEvaluator.rules = table:row_count:avg:le,job:duration:latest:le

Rule format is scope:metricName:aggregation:comparison:

  • scope: table or job (table rules also apply to partition scope)
  • aggregation: max|min|avg|latest
  • comparison: lt|le|gt|ge|eq|ne

When metrics are produced by submit-update-stats-job --update-mode metrics, metric names are often custom-* (for example custom-data-file-mse). Use list-table-metrics first and configure rules with the exact metric names returned by your environment.

Submit built-in update stats jobs

Submit built-in Iceberg update stats/metrics Spark jobs directly.

./bin/gravitino-optimizer.sh \
--type submit-update-stats-job \
--identifiers rest_catalog.db.t1 \
--update-mode all \
--updater-options '{"gravitino_uri":"http://localhost:8090","metalake":"test"}' \
--spark-conf '{"spark.sql.catalog.rest_catalog.type":"rest","spark.sql.catalog.rest_catalog.uri":"http://localhost:9001/iceberg","spark.hadoop.fs.defaultFS":"file:///"}'

Notes:

  • --identifiers supports catalog.schema.table or schema.table (when default catalog is configured).
  • --update-mode supports stats|metrics|all (default all).
  • For stats or all, --updater-options must include gravitino_uri and metalake.
  • If --updater-options includes external JDBC metrics settings (gravitino.optimizer.jdbcMetrics.*), ensure the JDBC driver JAR is available to Spark runtime classpath (for example via spark.jars in --spark-conf).
  • --spark-conf and --updater-options are flat JSON maps.

List table metrics

Query stored metrics at table scope.

./bin/gravitino-optimizer.sh \
--type list-table-metrics \
--identifiers catalog.db.sales

For partition scope, provide a partition path JSON array:

./bin/gravitino-optimizer.sh \
--type list-table-metrics \
--identifiers catalog.db.sales \
--partition-path '[{"dt":"2026-01-01"}]'

List job metrics

Query stored metrics at job scope.

./bin/gravitino-optimizer.sh \
--type list-job-metrics \
--identifiers catalog.db.optimizer_job

Output guide

  • SUMMARY: ...: summary for update-statistics and append-metrics
  • DRY-RUN: ...: recommendation preview without job submission
  • SUBMIT: ...: strategy job or built-in update-stats job submitted successfully
  • SUMMARY: submit-update-stats-job ...: summary for built-in update-stats submission
  • MetricsResult{...}: returned by list commands
  • EvaluationResult{...}: returned by monitor command

Examples:

SUMMARY: statistics totalRecords=3 tableRecords=2 partitionRecords=1 jobRecords=0
DRY-RUN: strategy=iceberg-data-compaction identifier=rest_catalog.db.t1 score=95 jobTemplate=builtin-iceberg-rewrite-data-files jobOptions={catalog_name=rest_catalog, table_identifier=db.t1}
SUBMIT: strategy=iceberg-data-compaction identifier=rest_catalog.db.t1 score=95 jobTemplate=builtin-iceberg-rewrite-data-files jobOptions={catalog_name=rest_catalog, table_identifier=db.t1} jobId=1f54c6d3-4e27-4cc8-bdfa-b05ecf59a4c2
DRY-RUN: identifier=rest_catalog.db.t1 jobTemplate=builtin-iceberg-update-stats jobConfig={catalog_name=rest_catalog, table_identifier=db.t1, update_mode=all, updater_options={"gravitino_uri":"http://localhost:8090","metalake":"test"}, spark_conf={"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}}
SUMMARY: submit-update-stats-job total=1 submitted=1 dryRun=false
MetricsResult{scopeType=TABLE, identifier=rest_catalog.db.t1, partitionPath=<table-or-job-scope>, metrics={row_count=[{timestamp=1735689600, value=100}]}}
EvaluationResult{scopeType=TABLE, identifier=rest_catalog.db.t1, partitionPath=<table-or-job-scope>, evaluation=true, evaluatorName=gravitino-metrics-evaluator, actionTimeSeconds=1735689600, rangeSeconds=86400, beforeMetrics={row_count=[MetricSample{timestampSeconds=1735686000, value=120}]}, afterMetrics={row_count=[MetricSample{timestampSeconds=1735689600, value=100}]}}