gravitino.api.rel.expressions.distributions.distributions.Distributions¶
- class gravitino.api.rel.expressions.distributions.distributions.Distributions¶
Bases:
object- __init__()¶
Methods
__init__()even(number, *expressions)Create a distribution by evenly distributing the data across the number of buckets.
fields(strategy, number, *field_names)Create a distribution on columns.
hash(number, *expressions)Create a distribution by hashing the data across the number of buckets.
of(strategy, number, *expressions)Create a distribution by the given strategy.
Attributes
List bucketing strategy hash, TODO: #1505 Separate the bucket number from the Distribution.
NONE is used to indicate that there is no distribution.
List bucketing strategy range, TODO: #1505 Separate the bucket number from the Distribution.
- HASH: Distribution = <gravitino.api.rel.expressions.distributions.distributions.DistributionImpl object>¶
List bucketing strategy hash, TODO: #1505 Separate the bucket number from the Distribution.
- NONE: Distribution = <gravitino.api.rel.expressions.distributions.distributions.DistributionImpl object>¶
NONE is used to indicate that there is no distribution.
- RANGE: Distribution = <gravitino.api.rel.expressions.distributions.distributions.DistributionImpl object>¶
List bucketing strategy range, TODO: #1505 Separate the bucket number from the Distribution.
- static even(number: int, *expressions: Expression) Distribution¶
Create a distribution by evenly distributing the data across the number of buckets.
- Parameters:
number – The number of buckets.
expressions – The expressions to distribute by.
- Returns:
The created even distribution.
- static fields(strategy: Strategy, number: int, *field_names: List[str]) Distribution¶
Create a distribution on columns. Like distribute by (a) or (a, b), for complex like distributing by (func(a), b) or (func(a), func(b)), please use DistributionImpl.Builder to create.
NOTE: a, b, c are column names.
SQL syntax: distribute by hash(a, b) buckets 5 fields(Strategy.HASH, 5, [“a”], [“b”])
SQL syntax: distribute by hash(a, b, c) buckets 10 fields(Strategy.HASH, 10, [“a”], [“b”], [“c”])
SQL syntax: distribute by EVEN(a) buckets 128 fields(Strategy.EVEN, 128, [“a”])
- Parameters:
strategy – The strategy to use.
number – The number of buckets.
field_names – The field names to distribute by.
- Returns:
The created distribution.
- static hash(number: int, *expressions: Expression) Distribution¶
Create a distribution by hashing the data across the number of buckets.
- Parameters:
number – The number of buckets.
expressions – The expressions to distribute by.
- Returns:
The created hash distribution.
- static of(strategy: Strategy, number: int, *expressions: Expression) Distribution¶
Create a distribution by the given strategy.
- Parameters:
strategy – The strategy to use.
number – The number of buckets.
expressions – The expressions to distribute by.
- Returns:
The created distribution.