Aggregate Functions¶
Overview¶
Aggregate functions operate on a set of input rows and compute one result per group. The functions on this page are available in Polars SQL.
Except for count(), count_if(), and approx_distinct(),
aggregate functions ignore null input values and return null when there are no
non-null input values. For example, sum() returns null rather than zero
for an empty input group, and avg() does not include null values in the
count.
General Aggregate Functions¶
- any_value(x) -> [same as input]()¶
This is an alias for
arbitrary().
- arbitrary(x) -> [same as input]()¶
Returns an arbitrary non-null value of
x, if one exists.
- avg(x) -> double()¶
Returns the average (arithmetic mean) of all non-null input values.
SELECT avg(v) FROM (VALUES (1), (NULL), (3)) AS t(v); -- 2.000
- count(*) -> bigint()¶
Returns the number of input rows.
SELECT count(*) FROM (VALUES (1), (NULL), (3)) AS t(v); -- 3
- count(x) -> bigint()¶
Returns the number of non-null input values.
SELECT count(v) FROM (VALUES (1), (NULL), (3)) AS t(v); -- 2
- count_if(x) -> bigint()¶
Returns the number of
TRUEinput values.SELECT count_if(flag) FROM (VALUES (TRUE), (FALSE), (NULL)) AS t(flag); -- 1
- max(x) -> [same as input]()¶
Returns the maximum non-null input value.
SELECT max(v) FROM (VALUES (1), (NULL), (3)) AS t(v); -- 3
- min(x) -> [same as input]()¶
Returns the minimum non-null input value.
SELECT min(v) FROM (VALUES (1), (NULL), (3)) AS t(v); -- 1
- sum(x) -> [same as input]()¶
Returns the sum of all non-null input values.
SELECT sum(v) FROM (VALUES (1), (NULL), (3)) AS t(v); -- 4
Approximate Aggregate Functions¶
- approx_distinct(x) -> bigint()¶
Returns the approximate number of distinct non-null input values. This function provides an approximation of
count(DISTINCT x). Zero is returned if all input values are null.SELECT approx_distinct(v) FROM (VALUES (1), (1), (2), (NULL)) AS t(v); -- 2
Statistical Aggregate Functions¶
- corr(y, x) -> double()¶
Returns the correlation coefficient of the non-null input pairs.
SELECT corr(y, x) FROM (VALUES (1, 2), (3, 4), (5, 6)) AS t(y, x); -- 1.0
- covar_pop(y, x) -> double()¶
Returns the population covariance of the non-null input pairs.
SELECT covar_pop(y, x) FROM (VALUES (1, 2), (3, 4), (5, 6)) AS t(y, x); -- 2.6666666666666665
- covar_samp(y, x) -> double()¶
Returns the sample covariance of the non-null input pairs.
SELECT covar_samp(y, x) FROM (VALUES (1, 2), (3, 4), (5, 6)) AS t(y, x); -- 4.0
- regr_count(y, x) -> bigint()¶
Returns the number of input rows where both
yandxare non-null.yis the dependent value andxis the independent value.
- regr_avgx(y, x) -> double()¶
Returns the average of the independent value for non-null input pairs.
- regr_avgy(y, x) -> double()¶
Returns the average of the dependent value for non-null input pairs.
- regr_sxx(y, x) -> double()¶
Returns the sum of squares of the independent value for non-null input pairs.
- regr_syy(y, x) -> double()¶
Returns the sum of squares of the dependent value for non-null input pairs.
- regr_sxy(y, x) -> double()¶
Returns the sum of products of the dependent and independent values for non-null input pairs.
- regr_slope(y, x) -> double()¶
Returns the slope of the linear regression line.
yis the dependent value andxis the independent value.
- regr_intercept(y, x) -> double()¶
Returns the intercept of the linear regression line.
yis the dependent value andxis the independent value.
- regr_r2(y, x) -> double()¶
Returns the coefficient of determination of the linear regression.
SELECT regr_count(y, x), regr_avgx(y, x), regr_avgy(y, x), regr_sxx(y, x), regr_syy(y, x), regr_sxy(y, x), regr_slope(y, x), regr_intercept(y, x), regr_r2(y, x) FROM ( VALUES (1, 2), (3, 4), (5, 6), (CAST(NULL AS INTEGER), 8), (9, CAST(NULL AS INTEGER)) ) AS t(y, x); -- 3, 4.0, 3.0, 8.0, 8.0, 8.0, 1.0, -1.0, 1.0
- stddev(x) -> double()¶
This is an alias for
stddev_samp().
- stddev_pop(x) -> double()¶
Returns the population standard deviation of all non-null input values.
- stddev_samp(x) -> double()¶
Returns the sample standard deviation of all non-null input values.
- variance(x) -> double()¶
This is an alias for
var_samp().
- var_pop(x) -> double()¶
Returns the population variance of all non-null input values.
- var_samp(x) -> double()¶
Returns the sample variance of all non-null input values.
SELECT var_pop(v), stddev_pop(v), var_samp(v), variance(v), stddev_samp(v), stddev(v) FROM (VALUES (1), (2), (3)) AS t(v); -- 0.6666666666666666, 0.816496580927726, 1.0, 1.0, 1.0, 1.0