Apache Gravitino 0.8.0 - strengthen the AI support for Apache Gravitino™ (incubating)
Apache Gravitino 0.8.0 is the third major release after entering the ASF. In this release, the community provides several exciting features like model catalog, Fuse for Fileset, credential vending for Fileset, Flink Iceberg and Paimon connector, Spark Paimon connector, and security enforcement.
This release blog will briefly introduce the new significant features and improvements. Please keep reading to learn more about what the community has worked on.
Model Catalog
Besides table and messaging metadata, Gravitino supports model metadata management in version 0.8. Gravitino allows a model to have multiple versions, and users can choose the best version. 0.8 provides basic functionality, and more features will be provided in the future, such as tagging models and better integration with machine learning workflows, to help users better manage models and extract more value from data and models.
- Support model versioning metadata #4783.
Credential vending
Credential vending is a fundamental function in the cloud. In version 0.7, credential vending was supported for the Iceberg REST server. In version 0.8, we offer support for the Gravitino server and integrate it with Fileset. Based on Credential vending, Fileset can be used more securely and conveniently. The Gravitino server will centrally manage the security key and issue a temporary token, which is only valid for the Fileset that needs to be accessed by the request, making it more secure and eliminating the need for the user side to provide information such as AKSK.
In addition to the support for GCS and S3, version 0.8 also has built-in support for OSS and ADLS credential vending, and can support other storage in a pluggable manner.
- Support credential vending for fileset client #5677.
- Support credential vending for Gravitino #4398.
- Support Aliyun OSS credential provider #5625.
- Support ADLS credential provider #5624.
Fuse for Fileset
With the widespread use of Fileset in AI scenarios, how to improve usability and reduce user usage costs has become a major issue. In AI scenarios, users tend to access remote data in the way of local disks. Fuse for fileset is designed based on this, enabling users to access data managed by Fileset as if they were using local disks. Currently, basic alpha functionality is provided, which allows access to S3 data managed by Fileset. In subsequent versions, metadata caching functionality and support for more storage will be provided. Fuse for fileset is developed in Rust for performance considerations, and everyone is welcome to join the development.
- Implement GVFS fuse to access Gravitino fileset in the POSIX Protocol #5504.