Skip to main content
Version: 1.2.0

Optimizer Configuration

Configuration layers

Use these layers together:

LayerScopeTypical keys
Gravitino server configRuntime for job manager and executorgravitino.job.executor, gravitino.job.statusPullIntervalInMs, gravitino.jobExecutor.local.sparkHome
Job submission jobConfPer job runcatalog_name, table_identifier, spark_*, template-specific args
Optimizer CLI configCLI commandsgravitino.optimizer.* in conf/gravitino-optimizer.conf

Server-side configuration

Set server-level runtime behavior in gravitino.conf.

gravitino.job.executor=local
gravitino.job.statusPullIntervalInMs=300000
gravitino.jobExecutor.local.sparkHome=/path/to/spark

For local demo environments, you can reduce gravitino.job.statusPullIntervalInMs (for example 10000) to get faster status updates. Restart Gravitino after changing this value.

Built-in update stats jobConf

Use builtin-iceberg-update-stats with at least these keys:

{
"catalog_name": "rest_catalog",
"table_identifier": "db.t1",
"update_mode": "all",
"updater_options": "{\"gravitino_uri\":\"http://localhost:8090\",\"metalake\":\"test\",\"statistics_updater\":\"gravitino-statistics-updater\",\"metrics_updater\":\"gravitino-metrics-updater\"}",
"spark_conf": "{\"spark.master\":\"local[2]\",\"spark.hadoop.fs.defaultFS\":\"file:///\"}",
"spark_master": "local[2]",
"spark_executor_instances": "1",
"spark_executor_cores": "1",
"spark_executor_memory": "1g",
"spark_driver_memory": "1g",
"catalog_type": "rest",
"catalog_uri": "http://localhost:9001/iceberg",
"warehouse_location": ""
}

warehouse_location can be empty for local filesystem testing. Set it to your warehouse URI for HDFS or cloud object storage environments.

Strategy submission configuration

submit-strategy-jobs needs optimizer CLI config. This is a minimal working example:

gravitino.optimizer.gravitinoUri = http://localhost:8090
gravitino.optimizer.gravitinoMetalake = test
gravitino.optimizer.gravitinoDefaultCatalog = rest_catalog
gravitino.optimizer.recommender.statisticsProvider = gravitino-statistics-provider
gravitino.optimizer.recommender.strategyProvider = gravitino-strategy-provider
gravitino.optimizer.recommender.tableMetaProvider = gravitino-table-metadata-provider
gravitino.optimizer.recommender.jobSubmitter = gravitino-job-submitter
gravitino.optimizer.strategyHandler.iceberg-data-compaction.className = org.apache.gravitino.maintenance.optimizer.recommender.handler.compaction.CompactionStrategyHandler
gravitino.optimizer.jobSubmitterConfig.catalog_name = rest_catalog
gravitino.optimizer.jobSubmitterConfig.spark_master = local[2]
gravitino.optimizer.jobSubmitterConfig.spark_executor_instances = 1
gravitino.optimizer.jobSubmitterConfig.spark_executor_cores = 1
gravitino.optimizer.jobSubmitterConfig.spark_executor_memory = 1g
gravitino.optimizer.jobSubmitterConfig.spark_driver_memory = 1g
gravitino.optimizer.jobSubmitterConfig.catalog_type = rest
gravitino.optimizer.jobSubmitterConfig.catalog_uri = http://localhost:9001/iceberg
# Leave empty for local filesystem; set to your warehouse URI for cloud/HDFS storage.
gravitino.optimizer.jobSubmitterConfig.warehouse_location =
gravitino.optimizer.jobSubmitterConfig.spark_conf = {"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}

--strategy-name must be the policy name, for example iceberg_compaction_default.

Local filesystem note

If your environment is local and not HDFS-based, set:

spark.hadoop.fs.defaultFS=file:///

Without this, Spark jobs may try hdfs://localhost:9000 and fail.

  • Job templates exist: builtin-iceberg-update-stats, builtin-iceberg-rewrite-data-files.
  • Policies are attached to target tables.
  • submit-strategy-jobs prints SUBMIT lines.
  • Rewrite logs show Rewritten data files: <N> where N > 0 for non-empty tables.