Manage Statistics
Introduction
Starting from 1.0.0, Gravitino introduces statistics of tables and partitions.
This document provides a brief introduction using both Gravitino Java client and REST APIs. If you want to know more about the statistics system in Gravitino, refer to the Javadoc and REST API documentation.
Statistics only support the custom statistics, which names must start with custom-.
Gravitino will support built-in statistics in the future.
The query engine uses statistics for cost-based optimization (CBO). Meanwhile, statistics can also be used for metadata action systems to trigger some jobs, such as compaction, data archiving, etc.
Create statistics. And then you can create policies based on statistics. Users can analyze the statistics
and policies to decide the next action. For example,
you can create a statistic named custom-tableLastModifiedTime to record the last modified time of a table.
Then you can create a policy to check if the table hasn't been modified for a long time, and archive the table data to
cold storage.
Gravitino doesn't handle the computation of the statistics, you need to compute the statistics and update them to Gravitino. Gravitino can't judge the expiration of the statistics, Ensure the statistics are up-to-date.
Metadata Object Statistic Operations
Update Statistics of Metadata Objects
Update the statistics of a metadata object by providing the statistics key and value. Now only table statistics can be updated.
The request path for REST API is /api/metalakes/{metalake}/objects/{metadataObjectType}/{metadataObjectName}/statistics.
- Shell
- Java
curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"updates" : {
"custom-tableLastModifiedTime": "20250128",
}
}' http://localhost:8090/api/metalakes/metalake/objects/table/catalog.schema.table/statistics
Table table = ...
Map<String, StatisticValue<?>> updateStatistics = Maps.newHashMap();
updateStatistics.put("custom-k1", StatisticValues.stringValue("v1"));
updateStatistics.put("custom-k2", StatisticValues.stringValue("v2"));
table.updateStatistics(updateStatistics);
List Statistics of Metadata Objects
List all the statistics of a metadata object. Now only table statistics can be listed.
The request path for REST API is /api/metalakes/{metalake}/objects/{metadataObjectType}/{metadataObjectName}/statistics.
- Shell
- Java
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/metalake/objects/table/catalog.schema.table/statistics
Table table = ...
table.listStatistics();
Drop Statistics of Metadata Objects
Drop the statistics of a metadata object by providing the statistics keys. Now only table statistics can be dropped.
The request path for REST API is /api/metalakes/{metalake}/objects/{metadataObjectType}/{metadataObjectName}/statistics.
- Shell
- Java
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"names":["custom-k1"]
}' http://localhost:8090/api/metalakes/metalake/objects/table/catalog.schema.table/statistics
Table table = ...
List<String> statisticsToDrop = Lists.newArrayList("custom-k1");
table.dropStatistics(statisticsToDrop);
Partition Statistics Operations
Update Statistics of Partitions
Update the statistics of a partition by providing the statistics key and value. If the statistics already exist, it will be updated; otherwise, a new statistic will be created.
The request path for REST API is /api/metalakes/{metalake}/objects/table/{metadataObjectName}/statistics/partitions.
- Shell
- Java
curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"updates":[{
"partitionName" : "p0" ,
"statistics" : {
"custom-k1" : "v1"
}
}]
}' http://localhost:8090/api/metalakes/metalake/objects/table/catalog.schema.table/statistics/partitions
Table table = ...
List<PartitionStatisticsUpdate> statisticsToUpdate = Lists.newArrayList();
Map<String, StatisticValue<?>> stats = Maps.newHashMap();
stats.put("custom-k1", StatisticValues.stringValue("v1"));
stats.put("custom-k2", StatisticValues.stringValue("v2"));
statisticsToUpdate.add(PartitionStatisticsModification.update("p1", stats));
table.updatePartitionStatistics(statisticsToUpdate);
List Statistics of Partitions
List the statistics of specified partitions.
Specify a range of partitions by providing the from and to parameters,
and whether the range is inclusive using fromInclusive and toInclusive parameters.
The request path for REST API is /api/metalakes/{metalake}/objects/table/{metadataObjectName}/statistics/partitions.
- Shell
- Java
curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
'http://localhost:8090/api/metalakes/metalake/objects/table/catalog.schema.table/statistics/partitions?from=p0&to=p1&fromInclusive=true&toInclusive=false'
Table table = ...
PartitionRange range = PartitionRange.downTo("p0", PartitionRange.BoundType.CLOSED);
table.listPartitionStatistics(range);