Skip to main content
Version: 0.9.0-incubating

Gravitino server Lineage support

Overview

Gravitino server provides a pluggable lineage framework to receive, process, and sink OpenLineage events. By leveraging this, you could do custom process for the lineage event and sink to your dedicated systems.

Lineage Configuration

Configuration itemDescriptionDefault valueRequiredSince Version
gravitino.lineage.sourceThe name of lineage event source.httpNo0.9.0-incubating
gravitino.lineage.${sourceName}.sourceClassThe name of the lineage source class which should implement org.apache.gravitino.lineage.source.LineageSource interface.(none)No0.9.0-incubating
gravitino.lineage.processorClassThe name of the lineage processor class which should implement org.apache.gravitino.lineage.processor.LineageProcessor interface. The default noop processor do nothing about the run event.org.apache.gravitino.lineage.processor.NoopProcessorNo0.9.0-incubating
gravitino.lineage.sinksThe Lineage event sink names (support multiple sinks separated by commas).logNo0.9.0-incubating
gravitino.lineage.${sinkName}.sinkClassThe name of the lineage sink class which should implement org.apache.gravitino.lineage.sink.LineageSink interface.(none)No0.9.0-incubating
gravitino.lineage.queueCapacityThe total capacity of lineage event queues. When there are multiple lineage sinks, each sink utilizes an isolated event queue. The capacity of each queue is calculated by dividing the value of gravitino.lineage.queueCapacity by the number of sinks.10000No0.9.0-incubating

Lineage http source

Http source provides an endpoint which follows OpenLineage API spec to receive OpenLineage run event. The following use example:

cat <<EOF >source.json
{
"eventType": "START",
"eventTime": "2023-10-28T19:52:00.001+10:00",
"run": {
"runId": "0176a8c2-fe01-7439-87e6-56a1a1b4029f"
},
"job": {
"namespace": "gravitino-namespace",
"name": "gravitino-job1"
},
"inputs": [{
"namespace": "gravitino-namespace",
"name": "gravitino-table-identifier"
}],
"producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
"schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
}
EOF

curl -X POST \
-i -H 'Content-Type: application/json' \
-d '@source.json' \
http://localhost:8090/api/lineage

Lineage log sink

Log sink prints the log in a separate log file gravitino_lineage.log, you could change the default behavior in conf/log4j2.properties.

High watermark status

When the lineage sink operates slowly, lineage events accumulate in the async queue. Once the queue size exceeds 90% of its capacity (high watermark threshold), the lineage system enters a high watermark status. In this state, the lineage source must implement retry and logging mechanisms for rejected events to prevent system overload. For the HTTP source, it returns the 429 Too Many Requests status code to the client.