Skip to main content
Version: 1.2.0

Iceberg compaction policy

Overview

system_iceberg_compaction is a built-in policy type used by the optimizer to generate compaction strategies and job contexts for Iceberg tables.

This policy supports CATALOG, SCHEMA, and TABLE metadata objects.

Policy content

The typed content for system_iceberg_compaction supports the following fields:

FieldRequiredDefaultDescription
minDataFileMseNo405323966463344Minimum threshold for metric custom-data-file-mse. Must be >= 0.
minDeleteFileNumberNo1Minimum threshold for metric custom-delete-file-number. Must be >= 0.
dataFileMseWeightNo1Score weight of custom-data-file-mse. Must be >= 0.
deleteFileNumberWeightNo100Score weight of custom-delete-file-number. Must be >= 0.
maxPartitionNumNo50Maximum number of partitions selected by optimizer. Must be > 0.
rewriteOptionsNo{}Additional rewrite options, expanded as job.options.* rules.

Generated rules and properties

The policy content is converted to:

  • Properties:
    • strategy.type=iceberg-data-compaction
    • job.template-name=builtin-iceberg-rewrite-data-files
  • Rules:
    • trigger-expr=custom-data-file-mse >= minDataFileMse || custom-delete-file-number >= minDeleteFileNumber
    • score-expr=custom-data-file-mse * dataFileMseWeight + custom-delete-file-number * deleteFileNumberWeight
    • max-partition-num=<maxPartitionNum>
    • job.options.<key>=<value> for each rewrite option

Parameter tuning guide

Metric unit and threshold formula

custom-data-file-mse is expected to be in byte^2.

Use the target file size and a tolerance ratio to set minDataFileMse:

minDataFileMse = (target-file-size-bytes * ratio)^2

Recommended ratio range: 0.1 to 0.2.

Default values use:

  • target-file-size-bytes = 134217728 (128 MiB)
  • ratio = 0.15
  • minDataFileMse = 405323966463344

Trigger behavior

The trigger expression uses >=.

  • Set minDeleteFileNumber = 1 to trigger when at least one delete file exists.
  • Set minDeleteFileNumber > 1 to reduce compaction frequency for delete files.

Score weights

Score is computed as:

custom-data-file-mse * dataFileMseWeight + custom-delete-file-number * deleteFileNumberWeight

  • Keep dataFileMseWeight = 1 as baseline.
  • Increase deleteFileNumberWeight if you want partitions with more delete files to be prioritized.
  • Keep both weights non-negative.
  • minDataFileMse = 405323966463344 (computed from 128 MiB and ratio 0.15)
  • minDeleteFileNumber = 1
  • dataFileMseWeight = 1
  • deleteFileNumberWeight = 100
  • maxPartitionNum = 50

Recommended rewriteOptions:

  • target-file-size-bytes = 134217728
  • min-input-files = 5
  • delete-file-threshold = 1

Create policy examples

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
-d '{
"name": "iceberg_compaction_default",
"comment": "Built-in iceberg compaction policy",
"policyType": "system_iceberg_compaction",
"enabled": true,
"content": {}
}' \
http://localhost:8090/api/metalakes/test/policies

Attach policy to metadata objects

After the policy is created, associate it with a catalog, schema, or table through standard policy association APIs. The optimizer will read the generated rules and properties to evaluate strategy triggering and job submission context.