gravitino.client.fileset_catalog.FilesetCatalog¶
- class gravitino.client.fileset_catalog.FilesetCatalog(namespace: Namespace, name: str | None = None, catalog_type: Type = Type.UNSUPPORTED, provider: str | None = None, comment: str | None = None, properties: Dict[str, str] | None = None, audit: AuditDTO | None = None, rest_client: HTTPClient | None = None)¶
Bases:
BaseSchemaCatalog
,SupportsCredentials
Fileset catalog is a catalog implementation that supports fileset like metadata operations, for example, schemas and filesets list, creation, update and deletion. A Fileset catalog is under the metalake.
- __init__(namespace: Namespace, name: str | None = None, catalog_type: Type = Type.UNSUPPORTED, provider: str | None = None, comment: str | None = None, properties: Dict[str, str] | None = None, audit: AuditDTO | None = None, rest_client: HTTPClient | None = None)¶
Methods
__init__
(namespace[, name, catalog_type, ...])alter_fileset
(ident, *changes)Update a fileset metadata in the catalog.
alter_schema
(schema_name, *changes)Alter the schema with specified identifier by applying the changes.
Raises:
Returns:
Return the {@link SupportsSchemas} if the catalog supports schema operations.
Raises:
Returns:
audit_info
()builder
([name, catalog_type, provider, ...])check_fileset_name_identifier
(ident)check_fileset_namespace
(namespace)comment
()The comment of the catalog.
create_fileset
(ident, comment, fileset_type, ...)Create a fileset metadata in the catalog.
create_schema
([schema_name, comment, properties])Create a new schema with specified identifier, comment and metadata.
drop_fileset
(ident)Drop a fileset from the catalog.
drop_schema
(schema_name, cascade)Drop the schema with specified identifier.
format_file_location_request_path
(namespace, ...)format_fileset_request_path
(namespace)format_schema_request_path
(ns)get_credential
(credential_type)Retrieves Credential object based on the specified credential type.
Retrieves a List of Credential objects.
get_file_location
(ident, sub_path)Get the actual location of a file or directory based on the storage location of Fileset and the sub path.
list_filesets
(namespace)List the filesets in a schema namespace from the catalog.
List all the schemas under the given catalog namespace.
load_fileset
(ident)Load fileset metadata by {@link NameIdentifier} from the catalog.
load_schema
(schema_name)Load the schema with specified identifier.
name
()Returns:
The properties of the catalog.
provider
()Returns:
schema_exists
(schema_name)Check if a schema exists.
support_credentials
()to_fileset_update_request
(change)to_schema_update_request
(change)type
()Returns:
validate
()Attributes
A reserved property to specify the package location of the catalog.
rest_client
- PROPERTY_PACKAGE = 'package'¶
A reserved property to specify the package location of the catalog. The “package” is a string of path to the folder where all the catalog related dependencies is located. The dependencies under the “package” will be loaded by Gravitino to create the catalog.
The property “package” is not needed if the catalog is a built-in one, Gravitino will search the proper location using “provider” to load the dependencies. Only when the folder is in different location, the “package” property is needed.
- class Type(value)¶
Bases:
Enum
The type of the catalog.
- FILESET = ('fileset', False)¶
Catalog Type for Fileset System (including HDFS, S3, etc.), like path/to/file
- MESSAGING = ('messaging', False)¶
Catalog Type for Message Queue, like kafka://topic
- MODEL = ('model', True)¶
Catalog Type for ML model
- RELATIONAL = ('relational', False)¶
“Catalog Type for Relational Data Structure, like db.table, catalog.db.table.
- UNSUPPORTED = ('unsupported', False)¶
Catalog Type for test only.
- property supports_managed_catalog¶
A flag to indicate if the catalog type supports managed catalog. Managed catalog is a concept in Gravitino, which means Gravitino will manage the lifecycle of the catalog and its subsidiaries. If the catalog type supports managed catalog, users can create managed catalog of this type without specifying the catalog provider, Gravitino will use the type as the provider to create the managed catalog. If the catalog type does not support managed catalog, users need to specify the provider to create the catalog.
- property type_name¶
The name of the catalog type.
- alter_fileset(ident: NameIdentifier, *changes) Fileset ¶
Update a fileset metadata in the catalog.
- Args:
ident: A fileset identifier, which should be “schema.fileset” format. changes: The changes to apply to the fileset.
- Args:
IllegalArgumentException If the changes are invalid. NoSuchFilesetException If the fileset does not exist.
- Returns:
The updated fileset metadata.
- alter_schema(schema_name: str, *changes: SchemaChange) Schema ¶
Alter the schema with specified identifier by applying the changes.
- Args:
schema_name: The name of the schema. changes: The metadata changes to apply.
- Raises:
NoSuchSchemaException if the schema with specified identifier does not exist.
- Returns:
The altered Schema.
- as_fileset_catalog()¶
- Raises:
UnsupportedOperationException if the catalog does not support fileset operations.
- Returns:
the FilesetCatalog if the catalog supports fileset operations.
- as_model_catalog() ModelCatalog ¶
- Returns:
the {@link ModelCatalog} if the catalog supports model operations.
- Raises:
UnsupportedOperationException if the catalog does not support model operations.
- as_schemas()¶
Return the {@link SupportsSchemas} if the catalog supports schema operations.
- Raises:
UnsupportedOperationException if the catalog does not support schema operations.
- Returns:
The {@link SupportsSchemas} if the catalog supports schema operations.
- as_table_catalog() TableCatalog ¶
- Raises:
UnsupportedOperationException if the catalog does not support table operations.
- Returns:
the {@link TableCatalog} if the catalog supports table operations.
- as_topic_catalog() TopicCatalog ¶
- Returns:
the {@link TopicCatalog} if the catalog supports topic operations.
- Raises:
UnsupportedOperationException if the catalog does not support topic operations.
- comment() str ¶
The comment of the catalog. Note. this method will return null if the comment is not set for this catalog.
- Returns:
The provider of the catalog.
- create_fileset(ident: NameIdentifier, comment: str, fileset_type: Type, storage_location: str, properties: Dict[str, str]) Fileset ¶
Create a fileset metadata in the catalog.
If the type of the fileset object is “MANAGED”, the underlying storageLocation can be null, and Gravitino will manage the storage location based on the location of the schema.
If the type of the fileset object is “EXTERNAL”, the underlying storageLocation must be set.
- Args:
ident: A fileset identifier, which should be “schema.fileset” format. comment: The comment of the fileset. fileset_type: The type of the fileset. storage_location: The storage location of the fileset. properties: The properties of the fileset.
- Raises:
NoSuchSchemaException If the schema does not exist. FilesetAlreadyExistsException If the fileset already exists.
- Returns:
The created fileset metadata
- create_schema(schema_name: str | None = None, comment: str | None = None, properties: Dict[str, str] | None = None) Schema ¶
Create a new schema with specified identifier, comment and metadata.
- Args:
schema_name: The name of the schema. comment: The comment of the schema. properties: The properties of the schema.
- Raises:
NoSuchCatalogException if the catalog with specified namespace does not exist. SchemaAlreadyExistsException if the schema with specified identifier already exists.
- Returns:
The created Schema.
- drop_fileset(ident: NameIdentifier) bool ¶
Drop a fileset from the catalog.
The underlying files will be deleted if this fileset type is managed, otherwise, only the metadata will be dropped.
- Args:
ident: A fileset identifier, which should be “schema.fileset” format.
- Returns:
true If the fileset is dropped, false the fileset did not exist.
- drop_schema(schema_name: str, cascade: bool) bool ¶
Drop the schema with specified identifier.
- Args:
schema_name: The name of the schema. cascade: Whether to drop all the tables under the schema.
- Raises:
NonEmptySchemaException if the schema is not empty and cascade is false.
- Returns:
true if the schema is dropped successfully, false otherwise.
- get_credential(credential_type: str) Credential ¶
Retrieves Credential object based on the specified credential type.
- Args:
credential_type: The type of the credential like s3-token, s3-secret-key which are defined in the specific credentials.
- Returns:
An Credential object with the specified credential type.
- Raises:
NoSuchCredentialException If the specific credential cannot be found. IllegalStateException if multiple credential can be found.
- get_credentials() List[Credential] ¶
Retrieves a List of Credential objects.
- Returns:
A List of Credential objects. In most cases the array only contains
one credential. If the object like Fileset contains multiple locations for different storages like HDFS, S3, the array will contain multiple credentials. The array could be empty if you request a credential for a catalog but the credential provider couldn’t generate the credential for the catalog, like S3 token credential provider only generate credential for the specific object like Fileset,Table. There will be at most one credential for one credential type.
- get_file_location(ident: NameIdentifier, sub_path: str) str ¶
Get the actual location of a file or directory based on the storage location of Fileset and the sub path.
- Args:
ident: A fileset identifier, which should be “schema.fileset” format. sub_path: The sub path of the file or directory.
- Returns:
The actual location of the file or directory.
- list_filesets(namespace: Namespace) List[NameIdentifier] ¶
List the filesets in a schema namespace from the catalog.
- Args:
namespace: A schema namespace. This namespace should have 1 level, which is the schema name
- Raises:
NoSuchSchemaException If the schema does not exist.
- Returns:
A list of NameIdentifier of filesets under the given namespace.
- list_schemas() List[str] ¶
List all the schemas under the given catalog namespace.
- Raises:
NoSuchCatalogException if the catalog with specified namespace does not exist.
- Returns:
A list of schema names under the given catalog namespace.
- load_fileset(ident: NameIdentifier) Fileset ¶
Load fileset metadata by {@link NameIdentifier} from the catalog.
- Args:
ident: A fileset identifier, which should be “schema.fileset” format.
- Raises:
NoSuchFilesetException If the fileset does not exist.
- Returns:
The fileset metadata.
- load_schema(schema_name: str) Schema ¶
Load the schema with specified identifier.
- Args:
schema_name: The name of the schema.
- Raises:
NoSuchSchemaException if the schema with specified identifier does not exist.
- Returns:
The Schema with specified identifier.
- name() str ¶
- Returns:
The name of the catalog.
- properties() Dict[str, str] ¶
The properties of the catalog. Note, this method will return null if the properties are not set.
- Returns:
The properties of the catalog.
- provider() str ¶
- Returns:
The provider of the catalog.
- schema_exists(schema_name: str) bool ¶
Check if a schema exists.
If an entity such as a table, view exists, its parent namespaces must also exist. For example, if table a.b.t exists, this method invoked as schema_exists(a.b) must return true.
- Args:
schema_name: The name of the schema.
- Returns:
True if the schema exists, false otherwise.