any_value
Obtains an arbitrary row from each aggregated group.
approx_count_distinct
Returns the approximate value of aggregate function similar to the result of COUNT(DISTINCT col).
approx_top_k
Returns the top `k` most frequently occurring item values in an `expr` along with their approximate counts.
avg
Returns the average value of selected fields.
bitmap
Here is a simple example to illustrate the usage of several aggregate functions in Bitmap.
bool_or
Returns true if at least one row for `expr` is true.
corr
Returns the Pearson correlation coefficient between two expressions.
count
Returns the total number of rows specified by an expression.
count_if
Returns the number of records that meet the specified condition or `0` if no records satisfy the condition.
covar_pop
Returns the population covariance of two expressions.
covar_samp
Returns the sample covariance of two expressions.
ds_hll_accumulate
Accumulates values into a DataSketches HyperLogLog (HLL) sketch and returns the serialized sketch as VARBINARY for approximate distinct counting.
ds_hll_combine
Combines multiple serialized DataSketches HyperLogLog (HLL) sketches into a single serialized sketch for approximate distinct counting.
ds_hll_count_distinct
Returns the approximate distinct count using DataSketches HyperLogLog (HLL). Similar to APPROX_COUNT_DISTINCT but with higher precision.
ds_hll_estimate
Estimates the approximate distinct count from a serialized DataSketches HyperLogLog (HLL) sketch.
ds_theta_count_distinct
Returns approximate distinct count using DataSketches Theta sketch. Faster than COUNT(DISTINCT) with lower memory usage for high-cardinality columns.
group_concat
Concatenates non-null values from a group into a single string, with a `sep` argument, which is `,` by default if not specified.
grouping
Indicates whether a column is an aggregate column.
grouping_id
grouping_id is used to distinguish the grouping statistics results of the same grouping standard.
hll_raw_agg
This function is an aggregate function that is used to aggregate HLL fields.
hll_union
Returns the concatenation of a set of HLL values.
hll_union_agg
HLL is an engineering implementation based on the HyperLogLog algorithm, which is used to save the intermediate results of HyperLogGog calculation process.
mann_whitney_u_test
`mann_whitney_u_test` performs the Mann-Whitney rank test on samples derived from two populations.
max
Returns the maximum value of the expr expression.
max_by
Returns the value of `x` associated with the maximum value of `y`.
min
Returns the minimum value of the expr expression.
min_by
Returns the value of `x` associated with the minimum value of `y`.
multi_distinct_count
Returns the total number of rows of the `expr`, equivalent to count(distinct expr).
multi_distinct_sum
Returns the sum of distinct values in `expr`, equivalent to sum(distinct expr).
percentile_approx
Returns the approximate value for a given percentile p, or an array of values for corresponding percentiles if p is an array.
percentile_approx_weight
Returns the approximation of the p-th percentile with weight. A weighted version of PERCENTILE_APPROX that accepts a weight column or constant for each input value.
percentile_cont
Computes the percentile value of `expr` with linear interpolation.
percentile_disc
Returns a percentile value based on a discrete distribution of the input column `expr`.
percentile_disc_lc
Returns a percentile value based on a discrete distribution of the input column `expr`.
retention
Calculates the user retention rate within a specified period of time.
std
Returns the standard deviation of an expression.
stddev,stddev_pop,std
Returns the population standard deviation of the expr expression.
stddev_samp
Returns the sample standard deviation of an expression.
sum
Returns the sum of non-null values for `expr`.
sum_map
Aggregates MAP values by summing numeric values for matching keys across multiple rows.
var_samp,variance_samp
Returns the sample variance of an expression.
variance,var_pop,variance_pop
Returns the population variance of an expression.
window_funnel
Searches for an event chain in a sliding window and calculates the maximum number of consecutive events in the event chain.