gravitino.api.rel.expressions.distributions.distributions.Distributions

class gravitino.api.rel.expressions.distributions.distributions.Distributions

Bases: object

__init__()

Methods

__init__()

even(number, *expressions)

Create a distribution by evenly distributing the data across the number of buckets.

fields(strategy, number, *field_names)

Create a distribution on columns.

hash(number, *expressions)

Create a distribution by hashing the data across the number of buckets.

of(strategy, number, *expressions)

Create a distribution by the given strategy.

Attributes

HASH

List bucketing strategy hash, TODO: #1505 Separate the bucket number from the Distribution.

NONE

NONE is used to indicate that there is no distribution.

RANGE

List bucketing strategy range, TODO: #1505 Separate the bucket number from the Distribution.

HASH: Distribution = <gravitino.api.rel.expressions.distributions.distributions.DistributionImpl object>

List bucketing strategy hash, TODO: #1505 Separate the bucket number from the Distribution.

NONE: Distribution = <gravitino.api.rel.expressions.distributions.distributions.DistributionImpl object>

NONE is used to indicate that there is no distribution.

RANGE: Distribution = <gravitino.api.rel.expressions.distributions.distributions.DistributionImpl object>

List bucketing strategy range, TODO: #1505 Separate the bucket number from the Distribution.

static even(number: int, *expressions: Expression) Distribution

Create a distribution by evenly distributing the data across the number of buckets.

Parameters:
  • number – The number of buckets.

  • expressions – The expressions to distribute by.

Returns:

The created even distribution.

static fields(strategy: Strategy, number: int, *field_names: List[str]) Distribution

Create a distribution on columns. Like distribute by (a) or (a, b), for complex like distributing by (func(a), b) or (func(a), func(b)), please use DistributionImpl.Builder to create.

NOTE: a, b, c are column names.

SQL syntax: distribute by hash(a, b) buckets 5 fields(Strategy.HASH, 5, [“a”], [“b”])

SQL syntax: distribute by hash(a, b, c) buckets 10 fields(Strategy.HASH, 10, [“a”], [“b”], [“c”])

SQL syntax: distribute by EVEN(a) buckets 128 fields(Strategy.EVEN, 128, [“a”])

Parameters:
  • strategy – The strategy to use.

  • number – The number of buckets.

  • field_names – The field names to distribute by.

Returns:

The created distribution.

static hash(number: int, *expressions: Expression) Distribution

Create a distribution by hashing the data across the number of buckets.

Parameters:
  • number – The number of buckets.

  • expressions – The expressions to distribute by.

Returns:

The created hash distribution.

static of(strategy: Strategy, number: int, *expressions: Expression) Distribution

Create a distribution by the given strategy.

Parameters:
  • strategy – The strategy to use.

  • number – The number of buckets.

  • expressions – The expressions to distribute by.

Returns:

The created distribution.