@Evolving public interface Fileset extends Auditable
Namespace
. A fileset is a virtual
concept of the file or directory that is managed by Apache Gravitino. Users can create a fileset
object to manage the non-tabular data on the FS-like storage. The typical use case is to manage
the training data for AI workloads. The major difference compare to the relational table is that
the fileset is schema-free, the main property of the fileset is the storage location of the
underlying data.
Fileset
defines the basic properties of a fileset object. A catalog implementation
with FilesetCatalog
should implement this interface.
Modifier and Type | Interface and Description |
---|---|
static class |
Fileset.Type
An enum representing the type of the fileset object.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
LOCATION_NAME_UNKNOWN
The reserved location name to indicate the location name is unknown.
|
static java.lang.String |
PROPERTY_CATALOG_PLACEHOLDER
The reserved property name for the catalog name placeholder, when creating a fileset, all
placeholders as {{catalog}} will be replaced by the catalog name
|
static java.lang.String |
PROPERTY_DEFAULT_LOCATION_NAME
The property name for the default location name of the fileset.
|
static java.lang.String |
PROPERTY_FILESET_PLACEHOLDER
The reserved property name for the fileset name placeholder, when creating a fileset, all
placeholders as {{fileset}} will be replaced by the fileset name
|
static java.lang.String |
PROPERTY_LOCATION_PLACEHOLDER_PREFIX
The prefix of fileset placeholder property
|
static java.lang.String |
PROPERTY_MULTIPLE_LOCATIONS_PREFIX
The prefix of the location name in the property at the catalog/schema level.
|
static java.lang.String |
PROPERTY_SCHEMA_PLACEHOLDER
The reserved property name for the schema name placeholder, when creating a fileset, all
placeholders as {{schema}} will be replaced by the schema name
|
Modifier and Type | Method and Description |
---|---|
default java.lang.String |
comment() |
java.lang.String |
name() |
default java.util.Map<java.lang.String,java.lang.String> |
properties() |
default java.lang.String |
storageLocation()
Get the unnamed storage location of the file or directory path that is managed by this fileset
object.
|
default java.util.Map<java.lang.String,java.lang.String> |
storageLocations()
Get the storage location name and corresponding path of the file or directory path that is
managed by this fileset object.
|
default SupportsCredentials |
supportsCredentials() |
default SupportsRoles |
supportsRoles() |
default SupportsTags |
supportsTags() |
Fileset.Type |
type() |
static final java.lang.String PROPERTY_MULTIPLE_LOCATIONS_PREFIX
static final java.lang.String PROPERTY_LOCATION_PLACEHOLDER_PREFIX
static final java.lang.String PROPERTY_CATALOG_PLACEHOLDER
static final java.lang.String PROPERTY_SCHEMA_PLACEHOLDER
static final java.lang.String PROPERTY_FILESET_PLACEHOLDER
static final java.lang.String PROPERTY_DEFAULT_LOCATION_NAME
static final java.lang.String LOCATION_NAME_UNKNOWN
java.lang.String name()
@Nullable default java.lang.String comment()
Fileset.Type type()
default java.lang.String storageLocation()
The returned storageLocation can either be the one specified when creating the fileset object (using storageLocation field or storageLocations field), or the one specified in the catalog / schema level (using property "location" or properties with prefix "location-") if the fileset object is created under this catalog / schema.
The storageLocation in each level can contain placeholders, format as {{name}}, which will be replaced by the corresponding fileset property value when the fileset object is created. The placeholder property in the fileset object is formed as "placeholder-{{name}}". For example, if the storageLocation is "file:///path/{{schema}}-{{fileset}}-{{version}}", and the fileset object "catalog1.schema1.fileset1" has the property "placeholder-version" set to "v1", then the storageLocation will be "file:///path/schema1-fileset1-v1".
For managed fileset, the storageLocation can be:
1) The one specified when creating the fileset object, and the placeholders in the storageLocation will be replaced by the placeholder value specified in the fileset properties.
2) When catalog property "location" is specified but schema property "location" is not specified, then the storageLocation will be:
a. "{catalog location}/schemaName/filesetName" if {catalog location} does not contain any placeholder.
b. "{catalog location}" - placeholders in the {catalog location} will be replaced by the placeholder value specified in the fileset properties.
3) When catalog property "location" is not specified but schema property "location" is specified, then the storageLocation will be:
a. "{schema location}/filesetName" if {schema location} does not contain any placeholder.
b. "{schema location}" - placeholders in the {schema location} will be replaced by the placeholder value specified in the fileset properties.
4) When both catalog property "location" and schema property "location" are specified, then the storageLocation will be:
a. "{schema location}/filesetName" if {schema location} does not contain any placeholder.
b. "{schema location}" - placeholders in the {schema location} will be replaced by the placeholder value specified in the fileset properties.
5) null value - when catalog property "location", schema property "location", storageLocation field of fileset, and "unknown" location in storageLocations are not specified.
For external fileset, the storageLocation can be:
1) The one specified when creating the fileset object, and the placeholders in the storageLocation will be replaced by the placeholder value specified in the fileset properties.
default java.util.Map<java.lang.String,java.lang.String> storageLocations()
Each storageLocation in the values can either be the one specified when creating the fileset object, or the one specified in the catalog / schema level if the fileset object is created under this catalog / schema.
The "unknown" location name is reserved to indicate the storage location of the fileset. It can be specified in catalog / schema level by the property "location" or in the fileset level by the field "storageLocation". Other location names can be specified in the fileset level by the key-value pairs in the field "storageLocations", and by "location-{name}" properties in the catalog / schema level.
The storageLocation in each level can contain placeholders, format as {{name}}, which will be replaced by the corresponding fileset property value when the fileset object is created. The placeholder property in the fileset object is formed as "placeholder-{{name}}". For example, if the storageLocation is "file:///path/{{schema}}-{{fileset}}-{{version}}", and the fileset object "catalog1.schema1.fileset1" has the property "placeholder-version" set to "v1", then the storageLocation will be "file:///path/schema1-fileset1-v1".
For managed fileset, the storageLocation can be:
1) The one specified when creating the fileset object, and the placeholders in the storageLocation will be replaced by the placeholder value specified in the fileset properties.
2) When catalog property "location" is specified but schema property "location" is not specified, then the storageLocation will be:
a. "{catalog location}/schemaName/filesetName" if {catalog location} does not contain any placeholder.
b. "{catalog location}" - placeholders in the {catalog location} will be replaced by the placeholder value specified in the fileset properties.
3) When catalog property "location" is not specified but schema property "location" is specified, then the storageLocation will be:
a. "{schema location}/filesetName" if {schema location} does not contain any placeholder.
b. "{schema location}" - placeholders in the {schema location} will be replaced by the placeholder value specified in the fileset properties.
4) When both catalog property "location" and schema property "location" are specified, then the storageLocation will be:
a. "{schema location}/filesetName" if {schema location} does not contain any placeholder.
b. "{schema location}" - placeholders in the {schema location} will be replaced by values specified in the fileset properties.
5) When there is no location specified in catalog level, schema level, storageLocation of fileset, and storageLocations of fileset at the same time, this situation is illegal.
For external fileset, the storageLocation can be:
1) The one specified when creating the fileset object, and the placeholders in the storageLocation will be replaced by the placeholder value specified in the fileset properties.
default java.util.Map<java.lang.String,java.lang.String> properties()
default SupportsTags supportsTags()
SupportsTags
if the fileset supports tag operations.java.lang.UnsupportedOperationException
- If the fileset does not support tag operations.default SupportsRoles supportsRoles()
SupportsRoles
if the fileset supports role operations.java.lang.UnsupportedOperationException
- If the fileset does not support role operations.default SupportsCredentials supportsCredentials()
SupportsCredentials
if the fileset supports credential operations.java.lang.UnsupportedOperationException
- If the fileset does not support credential operations.