Version: 1.2.0

Spark connector - User-defined functions

Overview

The Apache Gravitino Spark connector supports loading user-defined functions (UDFs) registered in the Gravitino function registry. Once a function is registered in Gravitino, Spark can discover and invoke it through standard Spark SQL syntax — no additional CREATE FUNCTION statement is needed.

note

Currently, only Java implementations with RuntimeType.SPARK are supported in the Spark connector. SQL and Python implementations registered in Gravitino cannot yet be invoked directly from Spark. Support for additional languages is planned for future releases.

Prerequisites

Before using Gravitino UDFs in Spark, ensure the following:

The Spark connector is configured and the catalog is accessible (see Spark connector setup).
The function has been registered in Gravitino with at least one definition that includes a Java implementation targeting RuntimeType.SPARK (see Register a function).
The JAR containing the UDF class is available on the Spark classpath (e.g. via --jars or spark.jars configuration).

Java UDF requirements

The Java class specified in className of the function implementation must implement Spark's org.apache.spark.sql.connector.catalog.functions.UnboundFunction interface. For details on implementing custom Spark functions, refer to the Spark DataSource V2 Functions documentation.

Key points:

The class must have a public no-arg constructor.
The class must be on the Spark driver and executor classpath.
Only functions with RuntimeType.SPARK are visible to the Spark connector; implementations targeting other runtimes (e.g. TRINO) are filtered out.

Calling functions in Spark SQL

Use the fully qualified three-part name catalog.schema.function_name to call a Gravitino-registered function:

-- Call a scalar function
SELECT my_catalog.my_schema.add_one(42);

-- Use in a query
SELECT id, my_catalog.my_schema.add_one(value) AS incremented
FROM my_catalog.my_schema.my_table;

tip

You can simplify the syntax by setting the default catalog and schema first:

USE my_catalog;
USE my_schema;
SELECT add_one(42);

Discovering functions

The Spark connector only exposes functions that have at least one Java implementation with RuntimeType.SPARK. Functions with only non-Spark implementations (e.g. TRINO) are not listed or loadable.

-- List all available functions in a schema (includes Gravitino UDFs with Spark runtime)
SHOW FUNCTIONS IN my_catalog.my_schema;

Overview​

Prerequisites​

Java UDF requirements​

Calling functions in Spark SQL​

Discovering functions​

Overview

Prerequisites

Java UDF requirements

Calling functions in Spark SQL

Discovering functions