Optimizer CLI Reference
Use --help to list all commands, or --help --type <command> for command-specific help.
By default, optimizer CLI loads conf/gravitino-optimizer.conf from the current working
directory. Use --conf-path only when you need a custom config file.
Command quick reference
Command (--type) | Required options | Optional options | Purpose |
|---|---|---|---|
submit-strategy-jobs | --identifiers, --strategy-name | --dry-run, --limit | Recommend and optionally submit jobs |
update-statistics | --calculator-name | --identifiers, --statistics-payload, --file-path | Calculate and persist statistics |
append-metrics | --calculator-name | --identifiers, --statistics-payload, --file-path | Calculate and append metrics |
monitor-metrics | --identifiers, --action-time | --range-seconds, --partition-path | Evaluate rules with before/after metrics |
list-table-metrics | --identifiers | --partition-path | Query stored table or partition metrics |
list-job-metrics | --identifiers | None | Query stored job metrics |
submit-update-stats-job | --identifiers | --dry-run, --update-mode, --updater-options, --spark-conf | Submit built-in Iceberg update stats/metrics Spark jobs |
Option field meanings
| Option | Meaning | Used by |
|---|---|---|
--identifiers | Comma-separated identifiers. Table format supports catalog.schema.table (or schema.table when default catalog is configured). | Most commands |
--strategy-name | Policy name to evaluate, for example iceberg_compaction_default. | submit-strategy-jobs |
--dry-run | Preview mode. Prints recommendations or job configs without submitting jobs. | submit-strategy-jobs, submit-update-stats-job |
--limit | Maximum number of strategy jobs to process. Must be > 0. | submit-strategy-jobs |
--calculator-name | Statistics/metrics calculator implementation name (for example local-stats-calculator). | update-statistics, append-metrics |
--statistics-payload | Inline JSON Lines content as input. Mutually exclusive with --file-path. | update-statistics, append-metrics |
--file-path | Path to JSON Lines input file. Mutually exclusive with --statistics-payload. | update-statistics, append-metrics |
--action-time | Action timestamp in epoch seconds used as evaluation anchor. | monitor-metrics |
--range-seconds | Time window (seconds) for monitor evaluation. Default is 86400 (24h). | monitor-metrics |
--partition-path | Partition path JSON array, for example '[{"dt":"2026-01-01"}]'. Requires exactly one identifier. | monitor-metrics, list-table-metrics |
--update-mode | Controls what built-in update job updates: stats, metrics, or all (default). | submit-update-stats-job |
--updater-options | Flat JSON map passed to updater logic. For stats/all, include gravitino_uri and metalake. | submit-update-stats-job |
--spark-conf | Flat JSON map of Spark and Iceberg catalog configs used by the job. | submit-update-stats-job |
Global option:
--conf-path: Optional custom config file path. If omitted, CLI usesconf/gravitino-optimizer.conf.
Input format for local-stats-calculator
local-stats-calculator reads JSON Lines (one JSON object per line).
Reserved fields
stats-type:table,partition, orjobidentifier: object identifierpartition-path: only for partition data, for example{"dt":"2026-01-01"}timestamp: optional epoch seconds (record-level default timestamp for metric points)
All other fields are treated as metric or statistic values.
Supported examples by scope
Use JSON Lines (one JSON object per line). The following examples focus on table, partition, and job scopes with multiple metric/statistic fields:
{"stats-type":"table","identifier":"catalog.db.t1","timestamp":1735689600,"row_count":100}
{"stats-type":"table","identifier":"catalog.db.t1","row_count":100,"total_file_size":1048576}
{"stats-type":"table","identifier":"catalog.db.t1","timestamp":1735689660,"row_count":120,"file_count":24,"avg_file_size":10485.76}
{"stats-type":"partition","identifier":"catalog.db.t1","timestamp":1735689720,"partition-path":{"dt":"2026-01-01"},"row_count":20}
{"stats-type":"partition","identifier":"catalog.db.t1","partition-path":{"dt":"2026-01-01","region":"us"},"row_count":12,"file_count":3}
{"stats-type":"job","identifier":"job-1","timestamp":1735689800,"duration_ms":12500,"rewritten_files":18}
Identifier rules
- Table and partition records:
catalog.schema.table - If
gravitino.optimizer.gravitinoDefaultCatalogis set,schema.tableis also accepted - Job records: parsed as a regular Gravitino
NameIdentifier
CLI workflow examples
Update statistics in batch
Calculate and persist table or partition statistics from JSONL input.
./bin/gravitino-optimizer.sh \
--type update-statistics \
--calculator-name local-stats-calculator \
--file-path ./table-stats.jsonl
Append metrics in batch
Calculate and append table or job metrics from JSONL input.
./bin/gravitino-optimizer.sh \
--type append-metrics \
--calculator-name local-stats-calculator \
--file-path ./table-stats.jsonl
Dry-run strategy submission
Preview recommendations without actually submitting jobs.
./bin/gravitino-optimizer.sh \
--type submit-strategy-jobs \
--identifiers rest_catalog.db.t1 \
--strategy-name iceberg_compaction_default \
--dry-run \
--limit 10
Submit strategy jobs
Submit jobs for identifiers that match the given policy name.
./bin/gravitino-optimizer.sh \
--type submit-strategy-jobs \
--identifiers rest_catalog.db.t1 \
--strategy-name iceberg_compaction_default \
--limit 10
Monitor metrics
Evaluate monitor rules around an action time.
./bin/gravitino-optimizer.sh \
--type monitor-metrics \
--identifiers catalog.db.sales \
--action-time 1735689600 \
--range-seconds 86400
You can configure evaluator rules in gravitino-optimizer.conf:
gravitino.optimizer.monitor.gravitinoMetricsEvaluator.rules = table:row_count:avg:le,job:duration:latest:le
Rule format is scope:metricName:aggregation:comparison:
scope:tableorjob(tablerules also apply to partition scope)aggregation:max|min|avg|latestcomparison:lt|le|gt|ge|eq|ne
When metrics are produced by submit-update-stats-job --update-mode metrics, metric names are
often custom-* (for example custom-data-file-mse). Use list-table-metrics first and
configure rules with the exact metric names returned by your environment.
Submit built-in update stats jobs
Submit built-in Iceberg update stats/metrics Spark jobs directly.
./bin/gravitino-optimizer.sh \
--type submit-update-stats-job \
--identifiers rest_catalog.db.t1 \
--update-mode all \
--updater-options '{"gravitino_uri":"http://localhost:8090","metalake":"test"}' \
--spark-conf '{"spark.sql.catalog.rest_catalog.type":"rest","spark.sql.catalog.rest_catalog.uri":"http://localhost:9001/iceberg","spark.hadoop.fs.defaultFS":"file:///"}'
Notes:
--identifierssupportscatalog.schema.tableorschema.table(when default catalog is configured).--update-modesupportsstats|metrics|all(defaultall).- For
statsorall,--updater-optionsmust includegravitino_uriandmetalake. - If
--updater-optionsincludes external JDBC metrics settings (gravitino.optimizer.jdbcMetrics.*), ensure the JDBC driver JAR is available to Spark runtime classpath (for example viaspark.jarsin--spark-conf). --spark-confand--updater-optionsare flat JSON maps.
List table metrics
Query stored metrics at table scope.
./bin/gravitino-optimizer.sh \
--type list-table-metrics \
--identifiers catalog.db.sales
For partition scope, provide a partition path JSON array:
./bin/gravitino-optimizer.sh \
--type list-table-metrics \
--identifiers catalog.db.sales \
--partition-path '[{"dt":"2026-01-01"}]'
List job metrics
Query stored metrics at job scope.
./bin/gravitino-optimizer.sh \
--type list-job-metrics \
--identifiers catalog.db.optimizer_job
Output guide
SUMMARY: ...: summary forupdate-statisticsandappend-metricsDRY-RUN: ...: recommendation preview without job submissionSUBMIT: ...: strategy job or built-in update-stats job submitted successfullySUMMARY: submit-update-stats-job ...: summary for built-in update-stats submissionMetricsResult{...}: returned by list commandsEvaluationResult{...}: returned by monitor command
Examples:
SUMMARY: statistics totalRecords=3 tableRecords=2 partitionRecords=1 jobRecords=0
DRY-RUN: strategy=iceberg-data-compaction identifier=rest_catalog.db.t1 score=95 jobTemplate=builtin-iceberg-rewrite-data-files jobOptions={catalog_name=rest_catalog, table_identifier=db.t1}
SUBMIT: strategy=iceberg-data-compaction identifier=rest_catalog.db.t1 score=95 jobTemplate=builtin-iceberg-rewrite-data-files jobOptions={catalog_name=rest_catalog, table_identifier=db.t1} jobId=1f54c6d3-4e27-4cc8-bdfa-b05ecf59a4c2
DRY-RUN: identifier=rest_catalog.db.t1 jobTemplate=builtin-iceberg-update-stats jobConfig={catalog_name=rest_catalog, table_identifier=db.t1, update_mode=all, updater_options={"gravitino_uri":"http://localhost:8090","metalake":"test"}, spark_conf={"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}}
SUMMARY: submit-update-stats-job total=1 submitted=1 dryRun=false
MetricsResult{scopeType=TABLE, identifier=rest_catalog.db.t1, partitionPath=<table-or-job-scope>, metrics={row_count=[{timestamp=1735689600, value=100}]}}
EvaluationResult{scopeType=TABLE, identifier=rest_catalog.db.t1, partitionPath=<table-or-job-scope>, evaluation=true, evaluatorName=gravitino-metrics-evaluator, actionTimeSeconds=1735689600, rangeSeconds=86400, beforeMetrics={row_count=[MetricSample{timestampSeconds=1735686000, value=120}]}, afterMetrics={row_count=[MetricSample{timestampSeconds=1735689600, value=100}]}}