HDFS (Hadoop Distributed File System) is an open-source distributed file system.
It is a key component of the Apache Hadoop ecosystem.
HDFS is designed as a distributed storage solution to store and process large-scale datasets.
It features high reliability, fault tolerance, and excellent performance.
The JDBC connection address specified in the catalog configuration.
It usually includes components such as the database type, host, port, and database name.
A virtual machine that enables a computer to run Java applications.
A JVM implements an abstract machine that is different from the underlying hardware.
The process of adding monitoring and management capabilities to the JVM.
The purpose of instrumentation is mainly for the collection of performance metrics.
Metrics related to the performance and behavior of the Java Virtual Machine.
Some valuable metrics are memory usage, garbage collection, and buffer pool metrics.
Representational State Transfer (REST) Application Programming Interface.
A set of rules and conventions for building and interacting with Web services using standard HTTP methods.
An open-source software platform initially created by Datastrato.
It is designed for high-performance, geo-distributed, and federated metadata lakes.
Gravitino can manage metadata directly in different sources, types, and regions,
providing data and AI assets with unified metadata access.
The configuration file for the Gravitino server, located in the conf directory.
It follows the standard properties file format and contains settings for the Gravitino server.
The Iceberg Hive catalog is a metadata service designed for the Apache Iceberg table format.
It allows external systems to interact with an Iceberg metadata using a Hive metastore thrift client.
The Iceberg JDBC catalog is a metadata service designed for the Apache Iceberg table format.
It enables external systems to interact with an Iceberg metadata service using JDBC.
The Iceberg REST Catalog is a metadata service designed for the Apache Iceberg table format.
It enables external systems to interact with Iceberg metadata service using a REST API.
A lightweight, standalone package that includes everything needed to run the software.
A container compiles an application with its dependencies and runtime for distribution.
A lightweight, standalone package that includes everything needed to run the software.
A Docker image typically comprises the code, runtime, libraries, and system tools.
A continuous integration and continuous deployment (CI/CD) service provided by GitHub.
GitHub Actions automate the build, test, and deployment workflows.
Lakehouse is a modern data management architecture that combines elements of data lakes and data warehouses.
It aims to provide a unified platform for storing, managing, and analyzing both raw unstructured data
(similar to data lakes) and curated structured data.
The top-level container for metadata.
Typically, a metalake is a tenant-like mapping to an organization or a company.
All the catalogs, users, and roles are associated with one metalake.
A standard protocol for authorization that allows third-party applications to authenticate a user.
The application doesn't need to access the user credentials.
Configurable settings and attributes associated with catalogs, schemas, and tables.
The property settings influence the behavior and storage of the corresponding entities.
A method developed by Google for serializing structured data, similar to XML or JSON.
It is often used for efficient and extensible communication between systems.
A token in the context of computing and security is a small, indivisible unit of data.
Tokens play a crucial role in various domains, including authentication and authorization.
A type of software testing where individual components or functions of a program are tested.
Unit tests help to ensure that the component or function works as expected in isolation.