Skip to main content
Version: 1.2.1

Apache Gravitino Spark connector integration test

Overview

There are two types of integration tests in spark connector, normal integration test like SparkXXCatalogIT, and the golden file integration test.

Normal integration test

Normal integration test are mainly used to test the correctness of the metadata, it's enabled in the GitHub CI. You could run tests with specific Spark version like:

./gradlew :spark-connector:spark-3.3:test --tests "org.apache.gravitino.spark.connector.integration.test.hive.SparkHiveCatalogIT33.testCreateHiveFormatPartitionTable"

Golden file integration test

Golden file integration test are mainly to test the correctness of the SQL result with massive data, it's disabled in the GitHub CI, you could run tests with following command:

./gradlew :spark-connector:spark-3.3:test --tests  "org.apache.gravitino.spark.connector.integration.test.sql.SparkSQLRegressionTest33" -PenableSparkSQLITs

Please change the Spark version number if you want to test other Spark versions. If you want to change the test behaviour, please modify spark-connector/spark-common/src/test/resources/spark-test.conf.

Configuration itemDescriptionDefault valueRequiredSince Version
gravitino.spark.test.dirThe Spark SQL test base dir, include test-sqls and data.spark-connector/spark-common/src/test/resources/No0.6.0-incubating
gravitino.spark.test.sqlsSpecify the test SQLs, using directory to specify group of SQLs like test-sqls/hive, using file path to specify one SQL like test-sqls/hive/basic.sql, use , to split multi partrun all SQLsNo0.6.0-incubating
gravitino.spark.test.generateGoldenFilesWhether generate golden files which are used to check the correctness of the SQL resultfalseNo0.6.0-incubating
gravitino.spark.test.metalakeThe metalake name to run the testtestNo0.6.0-incubating
gravitino.spark.test.setupEnvWhether to setup Gravitino and Hive environmentfalseNo0.6.0-incubating
gravitino.spark.test.uriGravitino uri address, only available when gravitino.spark.test.setupEnv is falsehttp://127.0.0.1:8090No0.6.0-incubating
gravitino.spark.test.iceberg.warehouseThe warehouse location, only available when gravitino.spark.test.setupEnv is falsehdfs://127.0.0.1:9000/user/hive/warehouse-spark-testNo0.6.0-incubating

The test SQL files are located in spark-connector/spark-common/src/test/resources/ by default. There are three directories:

  • hive, SQL tests for Hive catalog.
  • lakehouse-iceberg, SQL tests for Iceberg catalog.
  • tpcds, SQL tests for tpcds in Hive catalog.

You could create a simple SQL file, like hive/catalog.sql, the program will check the output with hive/catalog.sql.out. For complex cases like tpcds, you could do some prepare work like create table&load data in prepare.sql.