Window Functions¶

Window functions perform calculations across rows of the query result. They run after the HAVING clause but before the final query ORDER BY clause.

Invoking a window function requires an OVER clause:

function(args) OVER (
    [PARTITION BY expression]
    [ORDER BY expression [ASC|DESC]]
    [frame]
)

The PARTITION BY clause separates rows into independent window partitions. The window ORDER BY clause determines the order used by the window function; ordinal references such as ORDER BY 1 are not supported in window ORDER BY clauses.

The verified frame forms are:

{ROWS|RANGE} frame_start
{ROWS|RANGE} BETWEEN frame_start AND frame_end

frame_start and frame_end can be:

UNBOUNDED PRECEDING
expression PRECEDING
CURRENT ROW
expression FOLLOWING
UNBOUNDED FOLLOWING

If no frame is specified, a window with ORDER BY uses RANGE UNBOUNDED PRECEDING through CURRENT ROW. A window without ORDER BY uses the whole partition.

Aggregate Window Functions¶

Aggregate functions can be used as window functions by adding an OVER clause. The aggregate is computed for each row over the rows in the current window frame.

SELECT grp, v,
       sum(v) OVER (
           PARTITION BY grp
           ORDER BY v
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_sum
FROM (VALUES (1, 10), (1, 20), (2, 5), (2, 7)) AS t(grp, v)
ORDER BY grp, v;
-- (1, 10, 10), (1, 20, 30), (2, 5, 5), (2, 7, 12)

Ranking Functions¶

cume_dist() -> double()¶: Returns the cumulative distribution of the current row in the window partition. Peer rows in the window ordering return the same value.

dense_rank() -> bigint()¶: Returns the rank of the current row without gaps between peer groups.

ntile(n) -> bigint()¶: Divides the rows for each window partition into n buckets numbered from 1 to n. The bucket count must be a positive constant integer.

percent_rank() -> double()¶: Returns (r - 1) / (n - 1), where r is rank() for the current row and n is the number of rows in the partition. A single-row partition returns 0.0.

rank() -> bigint()¶: Returns the rank of the current row. Peer rows have the same rank, and the next rank includes gaps for the peer rows.

row_number() -> bigint()¶: Returns a sequential number for each row, starting with one, according to the ordering within the window partition.

SELECT v,
       rank() OVER (ORDER BY v),
       dense_rank() OVER (ORDER BY v),
       percent_rank() OVER (ORDER BY v),
       cume_dist() OVER (ORDER BY v)
FROM (VALUES (10), (20), (20), (30)) AS t(v)
ORDER BY v;
-- (10, 1, 1, 0.0, 0.25)
-- (20, 2, 2, 0.3333333333333333, 0.75)
-- (20, 2, 2, 0.3333333333333333, 0.75)
-- (30, 4, 3, 1.0, 1.0)

Value Functions¶

Value functions support RESPECT NULLS and IGNORE NULLS after the function call and before OVER. Null values are respected by default. If IGNORE NULLS is specified, rows where the value expression is null are excluded from the value-function calculation.

first_value(x) -> [same as input]()¶: Returns the first value of the window frame.

last_value(x) -> [same as input]()¶: Returns the last value of the window frame.

nth_value(x, offset) -> [same as input]()¶: Returns the value at the specified one-based offset from the beginning of the window frame. The offset must be a positive constant integer. If the offset is greater than the number of values in the frame, null is returned.

lead(x[, offset [, default_value]]) -> [same as input]()¶: Returns the value at offset rows after the current row in the window partition. The default offset is 1. If the offset refers to a row outside the partition, default_value is returned, or null when no default value is specified. Offset and default arguments must be constants.

lag(x[, offset [, default_value]]) -> [same as input]()¶: Returns the value at offset rows before the current row in the window partition. The default offset is 1. If the offset refers to a row outside the partition, default_value is returned, or null when no default value is specified. Offset and default arguments must be constants.

SELECT id,
       first_value(v) OVER win,
       last_value(v) OVER win,
       nth_value(v, 2) OVER win,
       lag(v) OVER win,
       lead(v, 2) OVER win
FROM (VALUES (1, 10), (2, 20), (3, 30), (4, 40)) AS t(id, v)
WINDOW win AS (
    ORDER BY id
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY id;
-- (1, 10, 40, 20, NULL, 30)
-- (2, 10, 40, 20, 10, 40)
-- (3, 10, 40, 20, 20, NULL)
-- (4, 10, 40, 20, 30, NULL)