Skip to main content
Version: 0.8.0-incubating

Apache Gravitino Glossary

API

  • Application Programming Interface, defining the methods and protocols for interacting with a server.

AWS

  • Amazon Web Services, a cloud computing platform provided by Amazon.

AWS Glue

  • A compatible implementation of the Hive Metastore Service (HMS).

GPG/GnuPG

  • Gnu Privacy Guard or GnuPG is an open-source implementation of the OpenPGP standard. It is usually used for encrypting and signing files and emails.

HDFS

  • HDFS (Hadoop Distributed File System) is an open-source distributed file system. It is a key component of the Apache Hadoop ecosystem. HDFS is designed as a distributed storage solution to store and process large-scale datasets. It features high reliability, fault tolerance, and excellent performance.

HTTP port

  • The port number on which a server listens for incoming connections.

IP address

  • Internet Protocol address, a numerical label assigned to each device in a computer network.

JDBC

  • Java Database Connectivity, an API for connecting Java applications to relational databases.

JDBC URI

  • The JDBC connection address specified in the catalog configuration. It usually includes components such as the database type, host, port, and database name.

JDK

  • The software development kit for the Java programming language. A JDK provides tools for compiling, debugging, and running Java applications.

JMX

  • Java Management Extensions provides tools for managing and monitoring Java applications.

JSON

  • JavaScript Object Notation, a lightweight data interchange format.

JSON Web Token

JVM

  • A virtual machine that enables a computer to run Java applications. A JVM implements an abstract machine that is different from the underlying hardware.

JVM instrumentation

  • The process of adding monitoring and management capabilities to the JVM. The purpose of instrumentation is mainly for the collection of performance metrics.

JVM metrics

  • Metrics related to the performance and behavior of the Java Virtual Machine. Some valuable metrics are memory usage, garbage collection, and buffer pool metrics.

JWT

  • A compact, URL-safe representation for claims between two parties.

KEYS file

  • A file containing public keys used to sign previous releases, necessary for verifying signatures.

PGP signature

  • A digital signature generated using the Pretty Good Privacy (PGP) algorithm. The signature is typically used to validate the authenticity of a file.

REST

  • A set of architectural principles for designing networked applications.

REST API

  • Representational State Transfer (REST) Application Programming Interface. A set of rules and conventions for building and interacting with Web services using standard HTTP methods.

SHA256 checksum

  • A cryptographic hash function used to verify the integrity of files.

SHA256 checksum file

  • A file containing the SHA256 hash value of another file, used for verification purposes.

SQL

  • A programming language used to manage and manipulate relational databases.

SSH

  • Secure Shell, a cryptographic network protocol used for secure communication over a computer network.

URI

  • Uniform Resource Identifier, a string that identifies the name or resource on the internet.

YAML

  • YAML Ain't Markup Language, a human-readable file format often used for structured data.

Amazon Elastic Block Store (EBS)

  • A scalable block storage service provided by Amazon Web Services (AWS).

Apache Gravitino

  • An open-source software platform initially created by Datastrato. It is designed for high-performance, geo-distributed, and federated metadata lakes. Gravitino can manage metadata directly in different sources, types, and regions, providing data and AI assets with unified metadata access.

Apache Gravitino configuration file (gravitino.conf)

  • The configuration file for the Gravitino server, located in the conf directory. It follows the standard properties file format and contains settings for the Gravitino server.

Apache Hadoop

  • An open-source distributed storage and processing framework.

Apache Hive

  • An open-source data warehousing software project. It provides SQL-like query language for managing and querying large datasets.

Apache Iceberg

  • An open-source, versioned table format for large-scale data processing.

Apache Iceberg Hive catalog

  • The Iceberg Hive catalog is a metadata service designed for the Apache Iceberg table format. It allows external systems to interact with an Iceberg metadata using a Hive metastore thrift client.

Apache Iceberg JDBC catalog

  • The Iceberg JDBC catalog is a metadata service designed for the Apache Iceberg table format. It enables external systems to interact with an Iceberg metadata service using JDBC.

Apache Iceberg REST catalog

  • The Iceberg REST Catalog is a metadata service designed for the Apache Iceberg table format. It enables external systems to interact with Iceberg metadata service using a REST API.

Apache License version 2

  • A permissive, open-source software license written by The Apache Software Foundation.

Authentication mechanism

  • The method used to verify the identity of users and clients accessing a server.

Binary distribution package

  • A software package containing the compiled executables for distribution and deployment.

Catalog

  • A collection of metadata from a specific metadata source.

Catalog provider

  • The specific system or technology used to store and manage metadata catalogs.

Columns

  • The individual fields or attributes of a table. Each column has properties like name, data type, comment, and nullability.

Continuous integration (CI)

  • The practice of automatically building and testing code changes when they are committed to version control.

Dependencies

  • External libraries or modules required by a project for its compilation and features.

Distribution

  • A packaged and deployable version of the software.

Docker

  • A platform for developing, shipping, and running applications in containers.

Docker container

  • A lightweight, standalone package that includes everything needed to run the software. A container compiles an application with its dependencies and runtime for distribution.

Docker Hub

  • A cloud-based registry service for Docker containers. Users can publish, browse and download containerized software using this service.

Docker image

  • A lightweight, standalone package that includes everything needed to run the software. A Docker image typically comprises the code, runtime, libraries, and system tools.

Dockerfile

  • A configuration file for building a Docker image. A Dockerfile contains instructions to build a standard image for distributing the software.

Dropwizard metrics

  • A Java library for measuring the performance of applications and providing support for various metric types.

Environment variables

  • Variables used to customize the runtime configuration for a process.

Geo-distributed

  • The distribution of data or services across multiple geographic locations.

Git

  • A distributed version control system used for tracking software artifacts.

GitHub

  • A web-based platform for version control and community collaboration using Git.

GitHub Actions

  • A continuous integration and continuous deployment (CI/CD) service provided by GitHub. GitHub Actions automate the build, test, and deployment workflows.

GitHub labels

  • Labels assigned to GitHub issues or pull requests for organization or workflow automation.

GitHub pull request

  • A proposed change to a GitHub repository submitted by a user.

GitHub repository

  • The location where GitHub stores a project's source code and related files.

GitHub workflow

  • A series of automated steps triggered by specific events on a GitHub repository.

Gradle

  • An automation tool for building, testing, and deploying projects.

Gradlew

  • A Gradle wrapper script used to execute Gradle commands.

Hashes

  • Cryptographic hash values generated from some data. A typical use case is to verify the integrity of a file.

Headless

  • A system without a local console.

Identity fields

  • Fields in tables that define the identity of the records. In the scope of a table, the identity fields are used as the unique identifier of a row.

Integration tests

  • Tests that ensure software correctness and compatibility when integrating components into a larger system.

Java Database Connectivity (JDBC)

Java Development Kits (JDKs)

Java Management Extensions

Java Toolchain

  • A Gradle feature for detecting and managing JDK versions.

Java Virtual Machine

Key pair

  • A pair of cryptographic keys, including a public key used for verification and a private key used for signing.

Lakehouse

  • Lakehouse is a modern data management architecture that combines elements of data lakes and data warehouses. It aims to provide a unified platform for storing, managing, and analyzing both raw unstructured data (similar to data lakes) and curated structured data.

Manifest

  • A list of files and their associated metadata that collectively define the structure and content of a release or distribution.

Merge operation

  • A process in Iceberg that involves combining changes from multiple snapshots into a new snapshot.

Metalake

  • The top-level container for metadata. Typically, a metalake is a tenant-like mapping to an organization or a company. All the catalogs, users, and roles are associated with one metalake.

Metastore

  • A central repository that stores metadata for a data warehouse.

Module

  • A distinct and separable part of a project.

Open authorization / OAuth

  • A standard protocol for authorization that allows third-party applications to authenticate a user. The application doesn't need to access the user credentials.

OrbStack

  • A tool mentioned as an alternative to Docker for macOS when running Gravitino integration tests.

Private key

  • A confidential key used for signing, decryption, or other operations that should remain confidential.

Properties

  • Configurable settings and attributes associated with catalogs, schemas, and tables. The property settings influence the behavior and storage of the corresponding entities.

Protocol buffers (protobuf)

  • A method developed by Google for serializing structured data, similar to XML or JSON. It is often used for efficient and extensible communication between systems.

Public key

  • An openly shared key used for verification, encryption, or other operations intended for public knowledge.

Representational State Transfer

RocksDB

  • An open source key-value storage database.

Schema

  • A logical container for organizing tables in a database.

Secure Shell

Security group

  • A virtual firewall for your instance to control inbound and outbound traffic.

Serde

  • A serialization/deserialization library. It can transform data between a tabular format and a format suitable for storage or transmission.

Snapshot

  • A point-in-time capture of the state of an Iceberg table, representing a specific version of the table.

Sort order

  • The arrangement of data within a Hive table, specified by expression or direction.

Spotless

  • A tool or process used to enforce code formatting standards and apply automatic formatting to code.

Structured Query Language

Table

  • A structured set of data elements stored in columns and rows.

Thrift

  • A network protocol used for communication with Hive Metastore Service (HMS).

Token

  • A token in the context of computing and security is a small, indivisible unit of data. Tokens play a crucial role in various domains, including authentication and authorization.

Trino

  • A query engine for big data processing.

Trino connector

  • A connector module for integrating Gravitino with Trino.

Ubuntu

  • A Linux distribution based on Debian, widely used for cloud computing and servers.

Unit test

  • A type of software testing where individual components or functions of a program are tested. Unit tests help to ensure that the component or function works as expected in isolation.

Verification

  • The process of confirming the authenticity and integrity of a release. This is usually done by checking its signature and associated hash values.

Web UI

  • A graphical interface accessible through a web browser.