Manage statistics in Gravitino
Introduction
Starting from 1.0.0, Gravitino introduces statistics of tables and partitions.
This document provides a brief introduction using both Gravitino Java client and REST APIs. If you want to know more about the statistics system in Gravitino, please refer to the Javadoc and REST API documentation.
Statistics only support the custom statistics, which names must start with custom-.
Gravitino will support built-in statistics in the future.
The query engine uses statistics for cost-based optimization (CBO). Meanwhile, statistics can also be used for metadata action systems to trigger some jobs, such as compaction, data archiving, etc.
You can create statistics. And then you can create policies based on statistics. Users can analyze the statistics
and policies to decide the next action. For example,
you can create a statistic named custom-tableLastModifiedTime to record the last modified time of a table.
Then you can create a policy to check if the table hasn't been modified for a long time, and archive the table data to
cold storage.
Currently, Gravitino doesn't handle the computation of the statistics, you need to compute the statistics and update them to Gravitino. Gravitino can't judge the expiration of the statistics, You need to ensure the statistics are up-to-date.