Skip to main content
Version: 1.2.0

Manage user-defined functions using Gravitino

This page introduces how to manage user-defined functions (UDFs) in Apache Gravitino. Gravitino provides a centralized function registry that allows you to define custom functions once and share them across multiple compute engines like Spark and Trino.

A function in Gravitino is characterized by:

  • Name: The function identifier within a schema.
  • Function type: SCALAR (row-by-row operations), AGGREGATE (group operations), or TABLE (set-returning operations).
  • Deterministic: Whether the function always returns the same result for the same input.
  • Definitions: One or more overloads, each with a specific parameter list, return type (or return columns for table functions), and one or more implementations for different runtimes (e.g. Spark, Trino).

Each definition can have multiple implementations in different languages (SQL, Java, Python) targeting different runtimes. Each definition must have at most one implementation per runtime — for example, you cannot have two implementations both targeting SPARK in the same definition. To replace an existing implementation, use updateImpl instead of addImpl.

LanguageKey fieldsDescription
SQLsqlAn inline SQL expression.
JavaclassNameFully qualified Java class name.
Pythonhandler, codeBlockPython handler entry point and optional inline code.

To use function management, please make sure that:

  • The Gravitino server has started and is serving at, e.g. http://localhost:8090.
  • A metalake has been created.
  • A catalog has been created within the metalake.
  • A schema has been created within the catalog.

Function operations

Register a function

You can register a function by sending a POST request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions endpoint or just use the Gravitino Java/Python client. The following is an example of registering a scalar function:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "add_one",
"functionType": "SCALAR",
"deterministic": true,
"comment": "A scalar function that adds one to the input",
"definitions": [
{
"parameters": [
{"name": "x", "dataType": "integer"}
],
"returnType": "integer",
"impls": [
{
"language": "SQL",
"runtime": "TRINO",
"sql": "x + 1"
}
]
}
]
}' http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions

For table-valued functions, use returnColumns instead of returnType in the function definition, and use FunctionType.TABLE as the function type:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "generate_series",
"functionType": "TABLE",
"deterministic": true,
"comment": "A table function that generates a series of integers",
"definitions": [
{
"parameters": [
{"name": "start_val", "dataType": "integer"},
{"name": "end_val", "dataType": "integer"}
],
"returnColumns": [
{"name": "value", "dataType": "integer", "comment": "The generated integer value"}
],
"impls": [
{
"language": "JAVA",
"runtime": "SPARK",
"className": "com.example.GenerateSeriesFunction",
"resources": {
"jars": ["hdfs:///path/to/udtf.jar"]
}
}
]
}
]
}' http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions

Get a function

You can get a function by sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions/{function_name} endpoint or by using the Gravitino Java/Python client. The following is an example of getting a function:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions/add_one

List functions

You can list all the functions in a schema by sending a GET request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions endpoint or by using the Gravitino Java/Python client. The following is an example of listing all the functions in a schema:

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions

You can also list functions with detailed information by adding the details query parameter. This returns the full function objects instead of just the identifiers.

curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
"http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions?details=true"

Alter a function

You can modify a function by sending a PUT request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions/{function_name} endpoint or using the Gravitino Java/Python client. The following is an example of updating a function's comment:

curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"updates": [
{
"@type": "updateComment",
"newComment": "An improved scalar function that adds one"
}
]
}' http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions/add_one

Supported modifications

The following operations are supported for altering a function:

OperationJSON ExampleJava MethodPython Method
Update comment{"@type":"updateComment","newComment":"new comment"}FunctionChange.updateComment("new comment")FunctionChange.update_comment("new comment")
Add definition{"@type":"addDefinition","definition":{...}}FunctionChange.addDefinition(definition)FunctionChange.add_definition(definition)
Remove definition{"@type":"removeDefinition","parameters":[{"name":"x","dataType":"integer"}]}FunctionChange.removeDefinition(params)FunctionChange.remove_definition(params)
Add implementation{"@type":"addImpl","parameters":[...],"implementation":{...}}FunctionChange.addImpl(params, impl)FunctionChange.add_impl(params, impl)
Update implementation{"@type":"updateImpl","parameters":[...],"runtime":"SPARK","implementation":{...}}FunctionChange.updateImpl(params, runtime, impl)FunctionChange.update_impl(params, runtime, impl)
Remove implementation{"@type":"removeImpl","parameters":[{"name":"x","dataType":"integer"}],"runtime":"SPARK"}FunctionChange.removeImpl(params, RuntimeType.SPARK)FunctionChange.remove_impl(params, RuntimeType.SPARK)
note

When using addImpl, the runtime of the new implementation must not already exist in the target definition. Use updateImpl to replace an existing implementation for a given runtime.

The following is an example of adding a new implementation to an existing function definition:

curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"updates": [
{
"@type": "addImpl",
"parameters": [
{"name": "x", "dataType": "integer"}
],
"implementation": {
"language": "JAVA",
"runtime": "TRINO",
"className": "com.example.AddOneFunction",
"resources": {
"jars": ["hdfs:///path/to/udf.jar"]
}
}
}
]
}' http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions/add_one

Drop a function

You can drop a function by sending a DELETE request to the /api/metalakes/{metalake_name}/catalogs/{catalog_name}/schemas/{schema_name}/functions/{function_name} endpoint or by using the Gravitino Java/Python client. The following is an example of dropping a function:

curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" \
http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions/add_one

Advanced examples

Register a function with multiple overloads

A function can have multiple definitions (overloads) with different parameter lists. Each definition has its own return type and implementations.

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "add",
"functionType": "SCALAR",
"deterministic": true,
"comment": "An overloaded add function",
"definitions": [
{
"parameters": [
{"name": "x", "dataType": "integer"},
{"name": "y", "dataType": "integer"}
],
"returnType": "integer",
"impls": [
{
"language": "SQL",
"runtime": "TRINO",
"sql": "x + y"
}
]
},
{
"parameters": [
{"name": "x", "dataType": "double"},
{"name": "y", "dataType": "double"}
],
"returnType": "double",
"impls": [
{
"language": "SQL",
"runtime": "TRINO",
"sql": "x + y"
}
]
}
]
}' http://localhost:8090/api/metalakes/example/catalogs/my_catalog/schemas/my_schema/functions

Using functions in compute engines

Once a function is registered in Gravitino, it can be used in supported compute engines. The engine's connector loads the function from Gravitino and invokes the appropriate implementation based on the runtime.

EngineRuntimeDocumentation
SparkRuntimeType.SPARKSpark connector - User-defined functions
note

Support for additional engines (e.g. Trino, Flink) will be documented as they become available.