Version: Next

Gravitino server Lineage support

Overview

Gravitino server provides a pluggable lineage framework to receive, process, and sink OpenLineage events. By leveraging this, you could do custom process for the lineage event and sink to your dedicated systems.

Lineage Configuration

Configuration item	Description	Default value	Required	Since Version
`gravitino.lineage.source`	The name of lineage event source.	http	No	0.9.0-incubating
`gravitino.lineage.${sourceName}.sourceClass`	The name of the lineage source class which should implement `org.apache.gravitino.lineage.source.LineageSource` interface.	(none)	No	0.9.0-incubating
`gravitino.lineage.processorClass`	The name of the lineage processor class which should implement `org.apache.gravitino.lineage.processor.LineageProcessor` interface. The default noop processor do nothing about the run event.	`org.apache.gravitino.lineage.processor.NoopProcessor`	No	0.9.0-incubating
`gravitino.lineage.sinks`	The Lineage event sink names (support multiple sinks separated by commas).	log	No	0.9.0-incubating
`gravitino.lineage.${sinkName}.sinkClass`	The name of the lineage sink class which should implement `org.apache.gravitino.lineage.sink.LineageSink` interface.	(none)	No	0.9.0-incubating
`gravitino.lineage.queueCapacity`	The total capacity of lineage event queues. When there are multiple lineage sinks, each sink utilizes an isolated event queue. The capacity of each queue is calculated by dividing the value of `gravitino.lineage.queueCapacity` by the number of sinks.	10000	No	0.9.0-incubating

Lineage http source

Http source provides an endpoint which follows OpenLineage API spec to receive OpenLineage run event. The following use example:

cat <<EOF >source.json
{
  "eventType": "START",
  "eventTime": "2023-10-28T19:52:00.001+10:00",
  "run": {
    "runId": "0176a8c2-fe01-7439-87e6-56a1a1b4029f"
  },
  "job": {
    "namespace": "gravitino-namespace",
    "name": "gravitino-job1"
  },
  "inputs": [{
    "namespace": "gravitino-namespace",
    "name": "gravitino-table-identifier"
  }],
  "producer": "https://github.com/OpenLineage/OpenLineage/blob/v1-0-0/client",
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/definitions/RunEvent"
}
EOF

curl -X POST \
  -i -H 'Content-Type: application/json' \
  -d '@source.json' \
  http://localhost:8090/api/lineage

Lineage log sink

Log sink prints the log in a separate log file gravitino_lineage.log, you could change the default behavior in conf/log4j2.properties.

High watermark status

When the lineage sink operates slowly, lineage events accumulate in the async queue. Once the queue size exceeds 90% of its capacity (high watermark threshold), the lineage system enters a high watermark status. In this state, the lineage source must implement retry and logging mechanisms for rejected events to prevent system overload. For the HTTP source, it returns the 429 Too Many Requests status code to the client.

Overview​

Lineage Configuration​

Lineage http source​

Lineage log sink​

High watermark status​

Overview

Lineage Configuration

Lineage http source

Lineage log sink

High watermark status