Skip to main content

Window Functions

SQL Dialect, Version, and Credit

Unless otherwise noted, all code examples were run using version 8.0.30 of MySQL. Intentionally, nearly all window function features outlined on this page also apply to window function usage in other SQL dialects (e.g., Postgres, SQL Server, etc.).

MySQL documentation as well as the MySQL tutorial site have been heavily used and referenced (implicitly or explicitly) throughout this page.

Guideposts

Tabular reference

Frame Clause Validity

The frame clause of a window function after ORDER BY (i.e., [ROWS|RANGE] ...) only applies to the following window functions:

  • aggregate functions used as window functions (e.g., AVG(), SUM(), MIN(), MAX(), COUNT(), etc.)
  • [FIRST|LAST|NTH]_VALUE() non-aggregate window functions

This is because the SQL standard specifies that window functions that operate on the entire partition (e.g., all non-aggregate window functions except [FIRST|LAST|NTH]_VALUE()) should have no frame clause. See the frame_clause section for more details.

FunctionDocsTutorialDescription
ROW_NUMBER()DocsTutorialNumber of current row within its partition. Used to generate a sequential number for each row within a partition of a result set.
RANK()DocsTutorialRank of current row within its partition, with gaps. Used to assign a rank to each row within a partition of a result set (with gaps).
DENSE_RANK()DocsTutorialRank of current row within its partition, without gaps. Used to assign a rank to each row within a partition of a result set (without gaps).
LEAD()DocsTutorialValue of argument from row leading current row within partition. Used to access data of a subsequent row from the current row in the same result set (it looks forward a number of rows and accesses data of that row from the current row).
LAG()DocsTutorialValue of argument from row lagging current row within partition. Used to access data of a previous row from the current row in the same result set (it looks back a number of rows and accesses data of that row from the current row).
FIRST_VALUE()DocsTutorialValue of argument from first row of window frame. Used to get the first value of a frame, partition, or result set.
LAST_VALUE()DocsTutorialValue of argument from last row of window frame. Used to get the last value of a frame, partition, or result set.
NTH_VALUE()DocsTutorialValue of argument from N-th row of window frame. Used to get the N-th value of a frame, partition, or result set.
NTILE()DocsTutorialBucket number of current row within its partition. Used to divide rows into a specified number of groups where each group is assigned a bucket number starting at 1. This function returns a bucket number for each row that represents the group to which the row belongs.
PERCENT_RANK()DocsTutorialPercentage rank value. Used to calculate the percentile ranking of a row within a partition or result set. Returns the percentage of partition values less than the value in the current row, excluding the highest value.
CUME_DIST()DocsTutorialCumulative distribution value. Used to calculate cumulative distribution value. Returns the percentage of partition values less than or equal to the value in the current row.

Descriptions

A lucid description of window functions may be found in [1]:

After the database server has completed all of the steps necessary to evaluate a query, including joining, filtering, grouping, and sorting, the result set is complete and ready to be returned to the caller. Imagine if you could pause the query execution at this point and take a walk through the result set while it is still held in memory; what types of analysis might you want to do? If your result set contains sales data, perhaps you might want to generate rankings for salespeople or regions, or calculate percentage differences between one time period and another. If you are generating results for a financial report, perhaps you would like to calculate subtotals for each report section, and a grand total for the final section. Using analytic functions [i.e., window functions], you can do all of these things and more.

We may find a somewhat more dry, albeit equally useful, description in [4]:

Once you understand the concept of grouping and using aggregates in SQL, understanding window functions is easy. Window functions, like aggregate functions, perform an aggregation on a defined set (a group) of rows, but rather than returning one value per group, window functions can return multiple values for each group. The group of rows to perform the aggregation on is the window.

Query execution order

When are window functions executed or their actions performed? We may find our answer in [4]:

It is important to note that window functions are performed as the last step in SQL processing prior to the ORDER BY clause.

Hence, SQL query execution order for SELECT queries that include window functions may be described as follows:

  1. FROM/JOIN (and all associated ON conditions)
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT (including window functions)
  6. DISTINCT
  7. ORDER BY
  8. LIMIT/OFFSET

See the query execution order doc entry for more on the order in which SQL queries are executed.

Syntax

The general and concise syntax for using a window function can be described as follows:

window_function_name(expression) OVER (window_spec);

window_spec

In the syntax block above, window_spec refers to the window specification, which has several parts, all of which are optional:

window_spec:
[window_name] [partition_clause] [order_clause] [frame_clause]

Hence, the complete syntax for using a window function is as follows (this syntax also applies to any aggregate function that may also be used as a window function):

window_function_name(expression) OVER ( 
[window_name]
[partition_clause]
[order_clause]
[frame_clause]
)

The syntax block above illustrates how the OVER clause is most responsible for defining how a window function will behave.

over_clause

The OVER clause takes one of two possible forms:

over_clause:
{OVER (window_spec) | OVER window_name}

Both forms define how the window function should process query rows:

  • OVER(window_spec): The window specification, window_spec, defines the window and appears directly in the OVER clause between the parentheses. This is the form most frequently encountered.
  • OVER window_name: The window definition is provided by window_name, a supplied reference that refers to a window specification defined by a WINDOW clause elsewhere in the query (i.e., window_name is basically a window alias).

As stated previously, all parts that comprise a window specification, window_spec, are optional; hence, if the OVER clause is empty (i.e., all optional parts of window_spec have been omitted, thus resulting in OVER()), then the window consists of all query rows, and the window function computes a result using all rows. If, however, any or all of the clauses

  • window_name
  • partition_clause
  • order_clause
  • frame_clause

are present in the window specification, then these clauses will determine how query rows are partitioned and ordered as well as how these query rows are used to compute the function result. Each clause is discussed more thoroughly below.

window_name

syntax
WINDOW window_name AS (window_spec)
[, window_name AS (window_spec)] ...
TLDR

window_name refers to the name of a window defined by a WINDOW clause elsewhere in the query. If window_name appears by itself within the OVER clause, then window_name completely defines the window. If, however, partitioning, ordering, or framing clauses are also present, then these clauses will modify how the named window is to be interpreted.

Windows can be defined and given names by which to refer to them in OVER clauses. To do this, use a WINDOW clause. If present in a query, the WINDOW clause falls between the positions of the HAVING and ORDER BY clauses, and has the following syntax:

WINDOW window_name AS (window_spec)
[, window_name AS (window_spec)] ...

For each window definition, window_name is the window name, and window_spec is the same type of window specification as given between the parentheses of an OVER clause, as previously described.

window_spec:
[window_name] [partition_clause] [order_clause] [frame_clause]

A WINDOW clause is useful for queries in which multiple OVER clauses would otherwise define the same window. Instead, you can define the window once, give it a name, and then refer to the name in the OVER clauses (i.e., a WINDOW clause essentially lets you define a window alias).

Consider the following query on the working data set, which defines the same window multiple times:

SELECT
profit,
# highlight-start
ROW_NUMBER() OVER (ORDER BY profit) AS 'row_number',
RANK() OVER (ORDER BY profit) AS 'rank',
DENSE_RANK() OVER (ORDER BY profit) AS 'dense_rank'
# highlight-end
FROM sales;

The highlighted lines above show that the same window specification, namely ORDER BY profit, is referred to multiple times. The query can be written more simply by using a WINDOW clause to define the window once and then refer to the window by name in the OVER clauses:

SELECT
profit,
# highlight-start
ROW_NUMBER() OVER w AS 'row_number',
RANK() OVER w AS 'rank',
DENSE_RANK() OVER w AS 'dense_rank'
# highlight-end
FROM sales
# highlight-next-line
WINDOW w AS (ORDER BY profit);

A named window can make it easy to experiment with window definitions to see the effect on query results — you only need to modify the window definition in the WINDOW clause rather than multiple OVER clause definitions.

If an OVER clause uses OVER(window_name ...) rather than OVER window_name, then the named window can be modified by the addition of other clauses. For example, the following query utilizes a named window w that is defined only by a partition clause but uses ORDER BY in the OVER clauses to modify w in different ways:

SELECT
DISTINCT year, country,
# highlight-start
FIRST_VALUE(year) OVER (w ORDER BY year ASC) AS first,
FIRST_VALUE(year) OVER (w ORDER BY year DESC) AS last
# highlight-end
FROM sales
# highlight-next-line
WINDOW w AS (PARTITION BY country);

An OVER clause can only add properties to a named window, not modify them. If the named window definition includes a partitioning, ordering, or framing property, then the OVER clause that refers to the window name cannot also include the same kind of property or an error occurs:

The following construct is not permitted because the OVER clause specifies PARTITION BY for a named window that already has PARTITION BY:

# highlight-error-next-line
OVER (w PARTITION BY year)
# highlight-error-next-line
... WINDOW w AS (PARTITION BY country)

The definition of a named window can itself begin with a window_name. In such cases, forward and backward references are permitted but not cycles:

The following is not permitted because it contains a cycle:

# highlight-error-next-line
WINDOW w1 AS (w2), w2 AS (w3), w3 AS (w1)

partition_clause

syntax
partition_clause:
PARTITION BY expr [, expr] ...

A PARTITION BY clause indicates how to divide the query rows into groups. The window function result for a given row is based on the rows of the partition that contains the row. If PARTITION BY is omitted, then there will be a single partition consisting of all query rows.

MySQL supports partitioning by expressions as well as columns

Standard SQL requires PARTITION BY to be followed by column names only. A MySQL extension is to permit expressions, not just column names. For example, if a table contains a TIMESTAMP column named ts, standard SQL permits PARTITION BY ts but not PARTITION BY HOUR(ts), whereas MySQL permits both.

order_clause

syntax
order_clause:
ORDER BY expr [ASC|DESC] [, expr [ASC|DESC]] ...

An ORDER BY clause indicates how to sort rows in each partition. Partition rows that are equal according to the ORDER BY clause are considered peers. If ORDER BY is omitted, then partition rows are unordered, with no processing order implied, and all partition rows are peers.

Each ORDER BY expression optionally can be followed by ASC or DESC to indicate sort direction. The default is ASC if no direction is specified. NULL values sort first for ascending sorts, last for descending sorts.

An ORDER BY in a window definition applies within individual partitions. To sort the result set as a whole, include an ORDER BY at the query top level.

frame_clause

syntax
frame_clause:
frame_units frame_extent

frame_units:
{ROWS | RANGE}

frame_extent:
{frame_start | frame_between}

frame_between:
BETWEEN frame_start AND frame_end

frame_start, frame_end: {
CURRENT ROW
| UNBOUNDED PRECEDING
| UNBOUNDED FOLLOWING
| expr PRECEDING
| expr FOLLOWING
}
TLDR
  • Frame definition: A frame is a subset of the current partition and the frame_clause specifies how to define the subset. The frame clause has many subclauses of its own (see below for more details).

  • Default window frame: If a window frame is not explicitly specified, then a default window frame definition is used. What definition is used by default generally depends on whether or not ORDER BY is specified in the window specification. If ORDER BY is not specified, then the query simply treats all rows as the window frame for each row, which is equivalent to the following frame definition:

    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

    If ORDER BY is specified, then generally this means RANGE UNBOUNDED PRECEDING will be used as the default window frame definition, which is equivalent to the following:

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • Frame validity: As noted in the cautionary box below, frame clauses only apply to aggregate functions used as window functions and the FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE() non-aggregate window functions. In MySQL, frame clauses can still be provided for other window functions, but they will be ignored (instead of throwing an error).

  • Frame units (ROWS vs RANGE): Be mindful when choosing the frame units for the frame clause of a window specification:

    • ROWS: Choosing ROWS will mean the frame is defined by beginning and ending row positions, where offsets (i.e., PRECEDING and FOLLOWING) are differences in row numbers from the current row number (i.e., all rows are essentially numbered in a frame that uses ROWS as its units, where each numbered row is considered to be its own entity).
    • RANGE: Choosing RANGE will mean the frame is defined by rows within a value range, where offsets (i.e., PRECEDING and FOLLOWING) are differences in row values from the current row value (i.e., all rows are essentially grouped by value in a frame that uses RANGE as its units, where each group of rows by value is considered to be its own entity).

    Basically, the difference between ROWS and RANGE is that RANGE will take into account all rows that have the same value in the column(s) by which we order while ROWS will only take into account a single (internally numbered) row; that is, ROWS always treats rows individually like the ROW_NUMBER() window function while RANGE treats rows as blocks/groups based on value equality of the column(s) being ordered by like the RANK() window function.

    As noted immediately above, the difference between ROWS and RANGE is similar to the difference between the ranking functions ROW_NUMBER() and RANK(). The query using ROWS will perform calculations on all rows which have their ROW_NUMBER() less than or equal to the row number of the current row. The query using RANGE will perform calculations on all rows which have their RANK() less than or equal to the rank of the current row (i.e., rows are given the same rank when the row values are considered to be equivalent by some ranking criteria). See more details in the exploration of frame units section later in this article.

Frame clauses only apply to aggregate functions used as window functions and [FIRST|LAST|NTH]_VALUE() non-aggregate window functions

As the MySQL docs note, aggregate functions used as window functions operate on rows in the current row frame, as do the following nonaggregate window functions:

FIRST_VALUE()
LAST_VALUE()
NTH_VALUE()

But the SQL standard specifies that window functions that operate on the entire partition should have no frame clause (e.g., it would be very strange if the RANK() window function were allowed a frame specification). MySQL permits a frame clause for such functions but ignores it (other RDBMS may throw an error). The following functions use the entire partition even if a frame is specified:

ROW_NUMBER()
RANK()
DENSE_RANK()
LEAD()
LAG()
NTILE()
CUME_DIST()
PERCENT_RANK()

The definition of a window used with a window function can include a frame clause, as indicated by the presence of the optional frame_clause for any window specification:

window_spec:
[window_name] [partition_clause] [order_clause] [frame_clause]

A frame is a subset of the current partition and the frame_clause specifies how to define that subset. Frames are determined with respect to the current row, which enables a frame to move within a partition depending on the location of the current row within its partition. For example:

  • Running totals: By defining a frame to be all rows from the partition start to the current row, you can compute running totals for each row.
  • Rolling averages: By defining a frame as extending N rows on either side of the current row, you can compute rolling averages.

The following query demonstrates the use of moving frames to compute running totals within each product group of year- and country-ordered values:

Query
SELECT
product,
country,
year,
profit,
# highlight-start
SUM(profit) OVER (w ROWS UNBOUNDED PRECEDING) AS running_total,
AVG(profit) OVER (w ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rolling_avg
# highlight-end
FROM
sales
WINDOW w AS (PARTITION BY product ORDER BY year, country)
ORDER BY
product, year, country;
Result Set
+------------+---------+------+--------+---------------+-------------+
| product | country | year | profit | running_total | rolling_avg |
+------------+---------+------+--------+---------------+-------------+
| Calculator | India | 2000 | 75 | 75 | 75.0000 |
| Calculator | India | 2000 | 75 | 150 | 75.0000 |
| Calculator | USA | 2000 | 75 | 225 | 66.6667 |
| Calculator | USA | 2001 | 50 | 275 | 62.5000 |
| Computer | Finland | 2000 | 1500 | 1500 | 1350.0000 |
| Computer | India | 2000 | 1200 | 2700 | 1400.0000 |
| Computer | USA | 2000 | 1500 | 4200 | 1400.0000 |
| Computer | USA | 2001 | 1500 | 5700 | 1400.0000 |
| Computer | USA | 2001 | 1200 | 6900 | 1350.0000 |
| Phone | Finland | 2000 | 100 | 100 | 55.0000 |
| Phone | Finland | 2001 | 10 | 110 | 55.0000 |
| TV | USA | 2001 | 150 | 150 | 125.0000 |
| TV | USA | 2001 | 100 | 250 | 125.0000 |
+------------+---------+------+--------+---------------+-------------+

For the running_average column, there is no frame row preceding the first row or following the last row. In these cases, AVG() computes the average of the rows that are available (e.g., see rolling_avg for rows where product = 'TV').

In general, if a frame clause is provided, then it should have the following syntax:

frame_clause:
frame_units frame_extent

frame_units:
{ROWS | RANGE}

If, however, a frame clause is not provided, then the default frame depends on whether or not an ORDER BY clause is present in the window specification, as described later in this section.

frame_units

The frame_units value indicates the type of relationship between the current row and frame rows:

  • ROWS: The frame is defined by beginning and ending row positions. Offsets are differences in row numbers from the current row number.
  • RANGE: The frame is defined by rows within a value range. Offsets are differences in row values from the current row value.

An exploration later in this section attempts to clarify the differences between ROWS and RANGE. The differences may seem insignificant or unclear at first, but they can be quite impactful.

frame_extent

The frame_extent value indicates the start and end points of the frame. You can specify just the start of the frame (in which case the current row is implicitly the end) or use BETWEEN to specify both frame endpoints:

frame_extent:
{frame_start | frame_between}

frame_between:
BETWEEN frame_start AND frame_end

frame_start, frame_end: {
CURRENT ROW
| UNBOUNDED PRECEDING
| UNBOUNDED FOLLOWING
| expr PRECEDING
| expr FOLLOWING
}

As noted above, if only the start of the frame is specified (i.e., frame_start), then the current row is implicitly the end and we can take advantage of the following abbreviated expressions:

AbbreviationMeaning
[ROWS|RANGE] UNBOUNDED PRECEDING[ROWS|RANGE] BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
[ROWS|RANGE] CURRENT ROW[ROWS|RANGE] BETWEEN CURRENT ROW AND CURRENT ROW
[ROWS|RANGE] n PRECEDING[ROWS|RANGE] BETWEEN n PRECEDING AND CURRENT ROW

With BETWEEN syntax, frame_start must not occur later than frame_end. The permitted frame_start and frame_end values have the following meanings:

  • CURRENT ROW:
    • ROWS: The bound is the current row.
    • RANGE: The bound is the peers of the current row.
  • UNBOUNDED PRECEDING: The bound is the first partition row.
  • UNBOUNDED FOLLOWING: The bound is the last partition row.
  • expr PRECEDING:
    • ROWS: The bound is expr rows before the current row.
    • RANGE: The bound is the rows with values equal to the current row value minus expr; if the current row value is NULL, then the bound is the peers of the row.
  • expr FOLLOWING:
    • ROWS: The bound is expr rows after the current row.
    • RANGE: The bound is the rows with values equal to the current row value plus expr; if the current row value is NULL, then the bound is the peers of the row.
n PRECEDING and n FOLLOWING with RANGE

It is often not a good idea (or sometimes impossible in an RDBMS other than MySQL) to use expr PRECEDING or expr FOLLOWING with RANGE. At the very least, implementation details will likely be different (e.g., see the MySQL docs and Postgres docs). Below we look at practical reasons as to why we may want to avoid the expr [PRECEDING|FOLLOWING] construction with RANGE.

With ROWS, we know there is always a single current row, meaning we can easily make calculations based on references to previous/next rows in relation to the current row (row references are simply positional/numeric).

With RANGE, however, referencing previous/next rows by value can become problematic. For example, consider what RANGE 3 PRECEDING might mean in different contexts. If we are dealing with days, then the meaning is not an issue: "three preceding days." But what if we are dealing with numbers and the current row has a numeric value of 14.5? MySQL makes it clear that RANGE 3 PRECEDING in such a context means "the bound is the rows with values equal to the current row value minus expr" (i.e., 14.5 - 3 = 11.5 in this case). Another RDBMS may do something different or have different restrictions.

The SQL standard may have defined the meaning of n PRECEDING and n FOLLOWING for RANGE, but many databases either do not implement this ability at all or differ significantly in how this is implemented. If you want to use this feature, then make sure you consult the documentation for your database before proceeding.

The following query demonstrates FIRST_VALUE(), LAST_VALUE(), and two instances of NTH_VALUE():

Query
SELECT
year,
country,
product,
profit,
FIRST_VALUE(profit) OVER w AS 'first',
LAST_VALUE(profit) OVER w AS 'last',
NTH_VALUE(profit, 2) OVER w AS 'second',
NTH_VALUE(profit, 4) OVER w AS 'fourth'
FROM
sales
WINDOW w AS (PARTITION BY product ORDER BY profit ROWS UNBOUNDED PRECEDING)
ORDER BY
product, profit;
Result Set
+------+---------+------------+--------+-------+------+--------+--------+
| year | country | product | profit | first | last | second | fourth |
+------+---------+------------+--------+-------+------+--------+--------+
| 2001 | USA | Calculator | 50 | 50 | 50 | NULL | NULL |
| 2000 | USA | Calculator | 75 | 50 | 75 | 75 | NULL |
| 2000 | India | Calculator | 75 | 50 | 75 | 75 | NULL |
| 2000 | India | Calculator | 75 | 50 | 75 | 75 | 75 |
| 2001 | USA | Computer | 1200 | 1200 | 1200 | NULL | NULL |
| 2000 | India | Computer | 1200 | 1200 | 1200 | 1200 | NULL |
| 2000 | Finland | Computer | 1500 | 1200 | 1500 | 1200 | NULL |
| 2001 | USA | Computer | 1500 | 1200 | 1500 | 1200 | 1500 |
| 2000 | USA | Computer | 1500 | 1200 | 1500 | 1200 | 1500 |
| 2001 | Finland | Phone | 10 | 10 | 10 | NULL | NULL |
| 2000 | Finland | Phone | 100 | 10 | 100 | 100 | NULL |
| 2001 | USA | TV | 100 | 100 | 100 | NULL | NULL |
| 2001 | USA | TV | 150 | 100 | 150 | 150 | NULL |
+------+---------+------------+--------+-------+------+--------+--------+

Each function uses the rows in the current frame, which, per the window definition shown, namely

PARTITION BY product ORDER BY profit ROWS UNBOUNDED PRECEDING

extends from the first partition row to the current row (the current row is implicitly the end row when only the start of the frame is specified). For the NTH_VALUE() calls, the current frame does not always include the requested row; in such cases, the return value is NULL.

frame_clause defaults

In the absence of a frame_clause, the default frame used in a window specification depends on whether or not an ORDER BY clause is present:

  • ORDER BY included: The default frame includes rows from the partition through the current row, including all peers of the current row (i.e., rows equal to the current row according to the ORDER BY clause). The default behavior with ORDER BY is thus equivalent to the following window frame specification:

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • ORDER BY excluded: The default frame includes all partition rows. This is because all partition rows are considered to be peers or equal to each other in the absence of an ordering (how could partition rows not be considered equal if no ordering is imposed on the partitions?). The default behavior without ORDER BY is thus equivalent to the following window frame specification:

    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
frame_units exploration (ROWS vs RANGE)

Perhaps the best way of understanding the difference between the ROWS and RANGE frame units is to see several examples of each in action. The first example, Example 0, is more expository/illustrative, and all subsequent examples are problem-based (Example 1 includes an extended discussion that sets a foundation for all other examples).

Attribution

Most of the content in the following examples come from learnsql.com. Specifically, much of the content for the first two examples comes from a a learnsql.com article on understanding the differences between the RANGE and ROWS window function frame units.

Example 0 (description-based)

The choice of frame units (i.e., ROWS or RANGE) for a window function frame specification clause limits the rows considered by a window function within a partition in different ways:

  • The ROWS clause limits the number of rows considered quite literally. It specifies a fixed number of rows that precede or follow the current row regardless of their value. These rows are used in the window function.
  • The RANGE clause, on the other hand, logically limits the rows considered based on their value compared to the current row.

A practical example where difference of frame unit choice leads to different result sets may help. Start by making use of the following example data set:

Let the revenue_consolidation table hold the following data (the next tab shows how to create this table along with its data using MySQL):

+------+---------+--------+------------+
| id | period | shop | revenue |
+------+---------+--------+------------+
| 1 | 2021/04 | Shop 2 | 341227.53 |
| 2 | 2021/05 | Shop 2 | 315447.24 |
| 3 | 2021/06 | Shop 1 | 1845662.35 |
| 4 | 2021/04 | Shop 2 | 21487.63 |
| 5 | 2021/05 | Shop 1 | 1489774.16 |
| 6 | 2021/06 | Shop 1 | 52489.35 |
| 7 | 2021/04 | Shop 1 | 154552.82 |
| 8 | 2021/05 | Shop 2 | 6548.49 |
| 9 | 2021/06 | Shop 2 | 387779.49 |
+------+---------+--------+------------+

As seen above, the revenue_consolidation table contains revenue data for two shops of one company for the second quarter of 2021. This table contains some "duplicates" in the sense of revenue being listed more than once for the same shop in the same period (e.g., the highlighted lines shown in the data set above are considered to be duplicates). Realistically, duplicates may exist in this scenario due to accounting adjustments (e.g., sometimes adjustments mean changing the total revenue after the books are closed for the month).

Now that we have some understanding of the sample data set, let's consider how using ROWS and RANGE can impact result sets for data we may want to report on. For example, suppose we want to calculate the cumulative revenue sum for every shop. Let's do this first with ROWS and then with RANGE and observe the differences in behavior.

ROWS

The ROWS units of a frame clause means a window frame will be defined as the number of rows preceding and/or following the current row.

SELECT
period,
shop,
revenue,
SUM(revenue) OVER(
PARTITION BY shop
ORDER BY period ASC
# highlight-next-line
ROWS UNBOUNDED PRECEDING
) AS rows_cumulative_revenue
FROM revenue_consolidation;
RANGE

The RANGE units of a frame clause means a window frame will be defined by the number of rows preceding and/or following the current row plus all other rows that have the same value.

SELECT
period,
shop,
revenue,
SUM(revenue) OVER(
PARTITION BY shop
ORDER BY period ASC
# highlight-next-line
RANGE UNBOUNDED PRECEDING
) AS range_cumulative_revenue
FROM revenue_consolidation;
Example 1 (problem-based with extended discussion)

Calculate the running sum for all orders in the single_order table (sorted by date). The result set should include columns id, placed, total_price, and running_sum.

Example 2 (ROWS, number of orders up to current order)

For each order from the single_order table, show the order's id, the placed date, and count which should show the count of the number of orders up to the current order when sorted by the placed date.

Use the COUNT() window function and ROWS in your solution.

Example 3 (ROWS, quantity of remaining items)

Warehouse workers always need to pick the products for orders by hand and one by one. For positions with order_id = 5 in the order_position table, calculate the remaining sum of all the products to pick. Specifically, for each position from orders with order_id = 5, show the position's id, product_id, quantity, and, most importantly, sum, the quantity of the remaining items (including the current row) when sorted by position id in ascending order.

Use the SUM() window function and ROWS in your solution.

Example 4 (ROWS, count of products introduced up to a point)

For each product from the product table, show the product's id, name, introduced date and count, the count of products introduced up to that point (ordered by introduced date in ascending order).

Use the COUNT() window function and ROWS in your solution.

Example 5 (ROWS, moving average)

For each order from the single_order table, show the order's placed date, total_price, average price avg calculated by taking 2 previous orders, the current order, and the 2 following orders (in terms of the placed date), and the ratio of the total_price to avg calculated above.

Use the AVG() window function and ROWS in your solution.

Example 6 (RANGE, daily average)

Calculate the average total_price for orders in the single_order table (sorted by date and reported for single days; that is, the average reported should not be different for the same day). The result set should include columns id, placed, total_price, and avg.

Use RANGE in your solution.

Example 7 (RANGE, number of stock changes for a day)

For each stock_change from the stock_change table with product_id = 7, show the stock change's id, quantity, changed date and another column count which will count the number of stock changes with product_id = 7 on that particular date.

Use RANGE in your solution.

Example 8 (RANGE, total quantity change for a product by date)

For each stock_change from the stock_change table, show the stock change's id, product_id, quantity, changed date, and sum, the total quantity change of stock changes for that product.

Use RANGE in your solution.

Example 9 (RANGE, number of stock changes for all products up through a date)

For each stock_change from the stock_change table, show the stock change's id, changed date, and count, the number of stock changes for all products that took place on the same day or any time earlier.

Use RANGE in your solution.

Example 10 (RANGE, total sum of order prices for same day or later)

For each order from the single_order table, show the order's id, placed date, total_price, and sum, the total sum of all prices of orders from the very same day or any later date.

Use RANGE in your solution.

Reminders

Most of the reminders below are remarked on in greater detail elsewhere in this document, but it may be helpful to have a quick "cheatsheet" of sorts before diving into all of the window function descriptions and examples.

Use MySQL for reproducibility

As noted at the beginning of this post, all code examples were run using version 8.0.30 of MySQL. If you want to reproduce the result or experiment with any of the code included in this post, then know that you may encounter issues if you try to use something other than MySQL version 8.0.30.

Validity of frame clauses

As noted in the frame_clause section, frame clauses (i.e., [ROWS|RANGE] ...) for window functions only apply to aggregate functions being used as window functions and the following non-aggregate window functions: FIRST_VALUE(), LAST_VALUE(), NTH_VALUE(). All other non-aggregate window functions operate on the entire partition, thus making usage of frame clauses nonsensical in such cases.

Null treatment

The following window functions permit a null_treatment clause:

Per the MySQL docs:

Some window functions permit a null_treatment clause that specifies how to handle NULL values when calculating results. This clause is optional. It is part of the SQL standard, but the MySQL implementation permits only RESPECT NULLS (which is also the default). This means that NULL values are considered when calculating results. IGNORE NULLS is parsed, but produces an error.

Consider usage of FIRST_VALUE() as an example:

NULL treatmentEffect
FIRST_VALUE(x) OVER [window_spec]Returns first value, including null of x in window_spec. This choice, which is conventional, implicitly uses RESPECT NULLS as the null_treatment clause.
FIRST_VALUE(x) RESPECT NULLS OVER [window_spec]Returns first value, including null of x in window_spec.
FIRST_VALUE(x) IGNORE NULLS OVER [window_spec]Returns first non-null value of x in window_spec. As the MySQL docs excerpt indicates above, IGNORE NULLS is parsed in MySQL but produces an error.

Working data set

The following data set will be used throughout; the second tab shows how this data set can be (re)created if desired:

+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2000 | Finland | Phone | 100 |
| 2001 | Finland | Phone | 10 |
| 2000 | India | Calculator | 75 |
| 2000 | India | Calculator | 75 |
| 2000 | India | Computer | 1200 |
| 2000 | USA | Calculator | 75 |
| 2000 | USA | Computer | 1500 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 1200 |
| 2001 | USA | Computer | 1500 |
| 2001 | USA | TV | 100 |
| 2001 | USA | TV | 150 |
+------+---------+------------+--------+

ROW_NUMBER()

Syntax
ROW_NUMBER() OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the number of the current row within its partition. Rows numbers range from 1 to the number of partition rows.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Row numberingORDER BY affects the order in which rows are numbered. Row numbering is nondeterministic without ORDER BY.
Peer row numberingROW_NUMBER() assigns peers different row numbers. To assign peers the same value, use RANK() or DENSE_RANK().
Function argumentWe have ROW_NUMBER() OVER(...) as opposed to ROW_NUMBER(expression) OVER(...) for this function's signature; hence, there is no argument or expression to provide for this window function.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query assigns a row number to each country based on ordering of profit (in descending order). Note how a different row number is assigned even if the profit is the same for the same country (see highlighted lines in the result set):

Query
SELECT
S.*,
ROW_NUMBER() OVER(PARTITION BY S.country ORDER BY S.profit DESC) AS 'row_number'
FROM
sales S;
Result Set
+------+---------+------------+--------+------------+
| year | country | product | profit | row_number |
+------+---------+------------+--------+------------+
| 2000 | Finland | Computer | 1500 | 1 |
| 2000 | Finland | Phone | 100 | 2 |
| 2001 | Finland | Phone | 10 | 3 |
| 2000 | India | Computer | 1200 | 1 |
| 2000 | India | Calculator | 75 | 2 |
| 2000 | India | Calculator | 75 | 3 |
| 2001 | USA | Computer | 1500 | 1 |
| 2000 | USA | Computer | 1500 | 2 |
| 2001 | USA | Computer | 1200 | 3 |
| 2001 | USA | TV | 150 | 4 |
| 2001 | USA | TV | 100 | 5 |
| 2000 | USA | Calculator | 75 | 6 |
| 2001 | USA | Calculator | 50 | 7 |
+------+---------+------------+--------+------------+

Application

TBD

LeetCode

TBD

RANK()

Syntax
RANK() OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the rank of the current row within its partition (with gaps). Peers are considered ties and receive the same rank. This function does not assign consecutive ranks to peer groups if groups of size greater than one exist; the result is noncontiguous rank numbers (i.e., gaps).
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Ranking achieved by orderingThis function should be used with ORDER BY to sort partition rows into the desired order. Without ORDER BY, all rows are peers.
Function argumentWe have RANK() OVER(...) as opposed to RANK(expression) OVER(...) for this function's signature; hence, there is no argument or expression to provide for this window function.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query assigns a rank to each sale within a country partition based on ordering of profit (in descending order). Note how the same rank value is assigned to rows where the profit is the same for the same country, but subsequent rank values have gaps (see highlighted lines in the result set):

Query
SELECT
S.*,
RANK() OVER(PARTITION BY S.country ORDER BY S.profit DESC) AS 'rank'
FROM
sales S;
Result Set
+------+---------+------------+--------+------+
| year | country | product | profit | rank |
+------+---------+------------+--------+------+
| 2000 | Finland | Computer | 1500 | 1 |
| 2000 | Finland | Phone | 100 | 2 |
| 2001 | Finland | Phone | 10 | 3 |
| 2000 | India | Computer | 1200 | 1 |
| 2000 | India | Calculator | 75 | 2 |
| 2000 | India | Calculator | 75 | 2 |
| 2001 | USA | Computer | 1500 | 1 |
| 2000 | USA | Computer | 1500 | 1 |
| 2001 | USA | Computer | 1200 | 3 |
| 2001 | USA | TV | 150 | 4 |
| 2001 | USA | TV | 100 | 5 |
| 2000 | USA | Calculator | 75 | 6 |
| 2001 | USA | Calculator | 50 | 7 |
+------+---------+------------+--------+------+

Application

TBD

LeetCode

TBD

DENSE_RANK()

Syntax
DENSE_RANK() OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the rank of the current row within its partition (without gaps). Peers are considered ties and receive the same rank. This function assigns consecutive ranks to peer groups; the result is that groups of size greater than one do not produce noncontiguous rank numbers (i.e., gaps).
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Ranking achieved by orderingThis function should be used with ORDER BY to sort partition rows into the desired order. Without ORDER BY, all rows are peers.
Function argumentWe have DENSE_RANK() OVER(...) as opposed to DENSE_RANK(expression) OVER(...) for this function's signature; hence, there is no argument or expression to provide for this window function.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query assigns a rank to each sale within a country partition based on ordering of profit (in descending order). Note how the same rank value is assigned to rows where the profit is the same for the same country, but subsequent rank values do not have gaps, unlike the basic example for RANK() (see highlighted lines in the result set):

Query
SELECT
S.*,
DENSE_RANK() OVER(PARTITION BY S.country ORDER BY S.profit DESC) AS 'dense_rank'
FROM
sales S;
Result Set
+------+---------+------------+--------+------------+
| year | country | product | profit | dense_rank |
+------+---------+------------+--------+------------+
| 2000 | Finland | Computer | 1500 | 1 |
| 2000 | Finland | Phone | 100 | 2 |
| 2001 | Finland | Phone | 10 | 3 |
| 2000 | India | Computer | 1200 | 1 |
| 2000 | India | Calculator | 75 | 2 |
| 2000 | India | Calculator | 75 | 2 |
| 2001 | USA | Computer | 1500 | 1 |
| 2000 | USA | Computer | 1500 | 1 |
| 2001 | USA | Computer | 1200 | 2 |
| 2001 | USA | TV | 150 | 3 |
| 2001 | USA | TV | 100 | 4 |
| 2000 | USA | Calculator | 75 | 5 |
| 2001 | USA | Calculator | 50 | 6 |
+------+---------+------------+--------+------------+

Application

TBD

LeetCode

TBD

LEAD()

Syntax
LEAD(scalar_expression [, offset[, default]]) [null_treatment] OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the value of scalar_expression from the row that "leads" (i.e., comes after) the current row by offset rows within the current row's partition. If there is no such row, then the return value is default. For example, if offset is 3, then the return value is default for the last three rows of a partition. If offset or default are missing, then the defaults are 1 and NULL, respectively.
scalar_expressionThe value to be returned based on the specified offset. It is an expression of any type (i.e., column, subquery, or other arbitrary expression) that returns a single (scalar) value.
offsetThe offset provided must be a literal nonnegative integer. If offset is 0, then scalar_expression is evaluated for the current row. In some languages (e.g., SQL Server), a subquery may be used to resolve to the offset value, but this is currently not supported in MySQL. See the MySQL docs for more restrictons/details concerning the offset value.
defaultMySQL does not provide details about restrictions concerning the default value. The SQL Server docs can be helpful here: "The value to return when offset is beyond the scope of the partition. If a default value is not specified, NULL is returned. default can be a column, subquery, or other expression, but it cannot be an analytic function. default must be type-compatible with scalar_expression." Note: MySQL does support the use of a subquery for the default value.
null_treatmentDefaults to RESPECT NULLS. See the NULL treatment reminder for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Function argumentsWe have LEAD(scalar_expression [, offset[, default]]) ... as the signature for this window function. Only the first argument, scalar_expression, is required (this is usually the name of a single column, but other expressions are valid). See summary row above for descriptions of arguments.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query first groups rows into partitions based on country value, and then sorts these rows within each partition by profit value (in descending order). Finally, LEAD(S.profit, 3, -1) ... is used to peek at the S.profit value 3 rows ahead of the current row in the partition; if such a row exists, then the lead value for the current row is simply S.profit, but if the row we are trying to peek ahead to does not exist, then the lead value for the current row is -1, the chosen default value.

Query
SELECT
S.*,
LEAD(S.profit, 3, -1) OVER(PARTITION BY S.country ORDER BY S.profit DESC) AS 'lead'
FROM
sales S;
Result Set
+------+---------+------------+--------+------+
| year | country | product | profit | lead |
+------+---------+------------+--------+------+
| 2000 | Finland | Computer | 1500 | -1 |
| 2000 | Finland | Phone | 100 | -1 |
| 2001 | Finland | Phone | 10 | -1 |
| 2000 | India | Computer | 1200 | -1 |
| 2000 | India | Calculator | 75 | -1 |
| 2000 | India | Calculator | 75 | -1 |
| 2001 | USA | Computer | 1500 | 150 |
| 2000 | USA | Computer | 1500 | 100 |
| 2001 | USA | Computer | 1200 | 75 |
| 2001 | USA | TV | 150 | 50 |
| 2001 | USA | TV | 100 | -1 |
| 2000 | USA | Calculator | 75 | -1 |
| 2001 | USA | Calculator | 50 | -1 |
+------+---------+------------+--------+------+

The highlighted lines in the result set above show the rows for which the default value was not used. In general, if you provide the LEAD() window function with an offset value of N, then the last N rows of each partition will use whatever default value is provided, as illustrated above where all of the lines that have not been highlighted constitue the last three rows of each country partition.

Application

TBD

LeetCode

TBD

LAG()

Syntax
LAG(scalar_expression [, offset[, default]]) [null_treatment] OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the value of scalar_expression from the row that "lags" (i.e., comes before) the current row by offset rows within the current row's partition. If there is no such row, then the return value is default. For example, if offset is 3, then the return value is default for the first three rows of a partition. If offset or default are missing, then the defaults are 1 and NULL, respectively.
scalar_expressionThe value to be returned based on the specified offset. It is an expression of any type (i.e., column, subquery, or other arbitrary expression) that returns a single (scalar) value.
offsetThe offset provided must be a literal nonnegative integer. If offset is 0, then scalar_expression is evaluated for the current row. In some languages (e.g., SQL Server), a subquery may be used to resolve to the offset value, but this is currently not supported in MySQL. See the MySQL docs for more restrictons/details concerning the offset value.
defaultMySQL does not provide details about restrictions concerning the default value. The SQL Server docs can be helpful here: "The value to return when offset is beyond the scope of the partition. If a default value is not specified, NULL is returned. default can be a column, subquery, or other expression, but it cannot be an analytic function. default must be type-compatible with scalar_expression." Note: MySQL does support the use of a subquery for the default value.
null_treatmentDefaults to RESPECT NULLS. See the NULL treatment reminder for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Function argumentsWe have LAG(scalar_expression [, offset[, default]]) ... as the signature for this window function. Only the first argument, scalar_expression, is required (this is usually the name of a single column, but other expressions are valid). See summary row above for descriptions of arguments.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query first groups rows into partitions based on country value, and then sorts these rows within each partition by profit value (in descending order). Finally, LAG(S.profit, 3, -1) ... is used to reference the S.profit value 3 rows behind the current row in the partition; if such a row exists, then the lag value for the current row is simply S.profit, but if the row we are trying to reference does not exist, then the lag value for the current row is -1, the chosen default value.

Query
SELECT
S.*,
LAG(S.profit, 3, -1) OVER(PARTITION BY S.country ORDER BY S.profit DESC) AS 'lag'
FROM
sales S;
Result Set
+------+---------+------------+--------+------+
| year | country | product | profit | lag |
+------+---------+------------+--------+------+
| 2000 | Finland | Computer | 1500 | -1 |
| 2000 | Finland | Phone | 100 | -1 |
| 2001 | Finland | Phone | 10 | -1 |
| 2000 | India | Computer | 1200 | -1 |
| 2000 | India | Calculator | 75 | -1 |
| 2000 | India | Calculator | 75 | -1 |
| 2001 | USA | Computer | 1500 | -1 |
| 2000 | USA | Computer | 1500 | -1 |
| 2001 | USA | Computer | 1200 | -1 |
| 2001 | USA | TV | 150 | 1500 |
| 2001 | USA | TV | 100 | 1500 |
| 2000 | USA | Calculator | 75 | 1200 |
| 2001 | USA | Calculator | 50 | 150 |
+------+---------+------------+--------+------+

The highlighted lines in the result set above show the rows for which the default value was not used. In general, if you provide the LAG() window function with an offset value of N, then the first N rows of each partition will use whatever default value is provided, as illustrated above where all of the lines that have not been highlighted constitue the first three rows of each country partition.

Application

TBD

LeetCode

TBD

FIRST_VALUE()

Syntax
FIRST_VALUE(scalar_expression) [null_treatment] OVER (
[window_name]
[partition_clause]
[order_clause]
[frame_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the value of scalar_expression from the first row of the window frame.
scalar_expressionThe value to be returned. It is an expression of any type (i.e., column, subquery, or other arbitrary expression) that returns a single (scalar) value.
null_treatmentDefaults to RESPECT NULLS. See the NULL treatment reminder for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
frame_clauseOptional. See the frame_clause section for more information.
Function argumentWe have FIRST_VALUE(scalar_expression) ... as the signature for this window function. The first and only argument, scalar_expression, is required (this is usually the name of a single column, but other expressions are also valid).
Frame clauseDefaults to RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING if ORDER BY is not used within the window specification (i.e., the whole partition is used as the window frame). If ORDER BY is used within the window specification, as it often is, then RANGE UNBOUNDED PRECEDING, which is shorthand for RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, is the default frame definition. See the frame_clause section for more details and examples.

Basic example

Using the working data set, the following query first groups rows into partitions based on year and product values, and then sorts these rows within each partition by profit value (in descending order). Finally, for each first row of each partition, FIRST_VALUE(S.profit) ... is used to assign the profit value of each partition's first row to all rows in the partition (these value assignments are made explicit in the first_value column).

Query
SELECT
S.year,
S.product,
S.country,
S.profit,
FIRST_VALUE(S.profit) OVER(PARTITION BY S.year, S.product ORDER BY S.profit DESC) AS 'first_value'
FROM
sales S
ORDER BY
S.year, S.product, S.profit DESC, S.country;
Result Set
+------+------------+---------+--------+-------------+
| year | product | country | profit | first_value |
+------+------------+---------+--------+-------------+
| 2000 | Calculator | India | 75 | 75 |
| 2000 | Calculator | India | 75 | 75 |
| 2000 | Calculator | USA | 75 | 75 |
| 2000 | Computer | Finland | 1500 | 1500 |
| 2000 | Computer | USA | 1500 | 1500 |
| 2000 | Computer | India | 1200 | 1500 |
| 2000 | Phone | Finland | 100 | 100 |
| 2001 | Calculator | USA | 50 | 50 |
| 2001 | Computer | USA | 1500 | 1500 |
| 2001 | Computer | USA | 1200 | 1500 |
| 2001 | Phone | Finland | 10 | 10 |
| 2001 | TV | USA | 150 | 150 |
| 2001 | TV | USA | 100 | 150 |
+------+------------+---------+--------+-------------+

Every highlighted line in the result set above denotes the first row for a given partition. There are seven partitions in total (the first row profit value for each partition appears after the -> symbol below):

2000 | Calculator -> 75
2000 | Computer -> 1500
2000 | Phone -> 100
2001 | Calculator -> 50
2001 | Computer -> 1500
2001 | Phone -> 10
2001 | TV -> 150

Note how the first value of profit for the first row of each partition above is used for all values of the first_value column for every row in the partition.

Application

TBD

LeetCode

TBD

LAST_VALUE()

Syntax
LAST_VALUE(scalar_expression) [null_treatment] OVER (
[window_name]
[partition_clause]
[order_clause]
[frame_clause]
)
CATEGORYDESCRIPTION
SummaryReturns the value of scalar_expression from the last row of the window frame.
scalar_expressionThe value to be returned. It is an expression of any type (i.e., column, subquery, or other arbitrary expression) that returns a single (scalar) value.
null_treatmentDefaults to RESPECT NULLS. See the NULL treatment reminder for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
frame_clauseOptional. See the frame_clause section for more information.
Function argumentWe have LAST_VALUE(scalar_expression) ... as the signature for this window function. The first and only argument, scalar_expression, is required (this is usually the name of a single column, but other expressions are also valid).
Frame clauseDefaults to RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING if ORDER BY is not used within the window specification (i.e., the whole partition is used as the window frame). If ORDER BY is used within the window specification, as it often is, then RANGE UNBOUNDED PRECEDING, which is shorthand for RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, is the default frame definition. See the frame_clause section for more details and examples.

Basic example

Using the working data set, the following query first groups rows into partitions based on year and product values, and then sorts these rows within each partition by profit value (in descending order). Finally, for each last row of each partition, LAST_VALUE(S.profit) ... is used to assign the profit value of each partition's last row to all rows in the partition (these value assignments are made explicit in the last_value column).

Note: This example is almost identical to the basic example for the FIRST_VALUE() window function, but here we must change the default frame definition to effectively access our desired last value for each partition. See the frame_clause section for more details and examples.

Query
SELECT
S.year,
S.product,
S.country,
S.profit,
LAST_VALUE(S.profit) OVER (PARTITION BY S.year, S.product ORDER BY S.profit DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'last_value'
FROM
sales S
ORDER BY
S.year, S.product, S.profit DESC, S.country;
Result Set
+------+------------+---------+--------+------------+
| year | product | country | profit | last_value |
+------+------------+---------+--------+------------+
| 2000 | Calculator | India | 75 | 75 |
| 2000 | Calculator | India | 75 | 75 |
| 2000 | Calculator | USA | 75 | 75 |
| 2000 | Computer | Finland | 1500 | 1200 |
| 2000 | Computer | USA | 1500 | 1200 |
| 2000 | Computer | India | 1200 | 1200 |
| 2000 | Phone | Finland | 100 | 100 |
| 2001 | Calculator | USA | 50 | 50 |
| 2001 | Computer | USA | 1500 | 1200 |
| 2001 | Computer | USA | 1200 | 1200 |
| 2001 | Phone | Finland | 10 | 10 |
| 2001 | TV | USA | 150 | 100 |
| 2001 | TV | USA | 100 | 100 |
+------+------------+---------+--------+------------+

Every highlighted line in the result set above denotes the last row for a given partition. There are seven partitions in total (the last row profit value for each partition appears after the -> symbol below):

2000 | Calculator -> 75
2000 | Computer -> 1200
2000 | Phone -> 100
2001 | Calculator -> 50
2001 | Computer -> 1200
2001 | Phone -> 10
2001 | TV -> 100

Note how the last value of profit for the last row of each partition above is used for all values of the last_value column for every row in the partition.

Application

TBD

LeetCode

TBD

NTH_VALUE()

Syntax
NTH_VALUE(scalar_expression, offset) [from_first_last] [null_treatment] OVER (
[window_name]
[partition_clause]
[order_clause]
[frame_clause]
)
CATEGORYDESCRIPTION
SummaryLet offset equal N. Returns the value of scalar_expression from the N-th row of the window frame. If there is no such row, then the return value is NULL.
scalar_expressionThe value to be returned based on the specified offset. It is an expression of any type (i.e., column, subquery, or other arbitrary expression) that returns a single (scalar) value.
offsetMust be a literal positive integer value.
from_first_lastThis clause is part of the SQL standard, but the MySQL implementation permits only FROM FIRST (which is also the default). This means that calculations begin at the first row of the window. FROM LAST is parsed, but produces an error. To obtain the same effect as FROM LAST (begin calculations at the last row of the window), use ORDER BY to sort in reverse order.
null_treatmentDefaults to RESPECT NULLS. See the NULL treatment reminder for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
frame_clauseOptional. See the frame_clause section for more information.
Function argumentsWe have NTH_VALUE(scalar_expression, offset) ... as the signature for this window function. Both arguments, scalar_expression and offset, are required. scalar_expression is usually the name of a single column, but other expressions are also valid; offset must be a literal positive integer value.
Frame clauseDefaults to RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING if ORDER BY is not used within the window specification (i.e., the whole partition is used as the window frame). If ORDER BY is used within the window specification, as it often is, then RANGE UNBOUNDED PRECEDING, which is shorthand for RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, is the default frame definition. See the frame_clause section for more details and examples.

Basic example

Using the working data set, the following query first groups rows into partitions based on year and product values, and then sorts these rows within each partition by profit value (in descending order). Finally, for the second row of each partition, NTH_VALUE(S.profit, 2) ... is used to assign the profit value of each partition's second row to all rows in the partition (these value assignments are made explicit in the last_value column).

Note: This example is identical to the basic example for the LAST_VALUE() window function; again, we must change the default frame definition to effectively access our desired second row value for each partition. See the frame_clause section for more details and examples.

Query
SELECT
S.year,
S.product,
S.country,
S.profit,
NTH_VALUE(S.profit, 2) OVER(PARTITION BY S.year, S.product ORDER BY S.profit DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS 'nth_value'
FROM
sales S
ORDER BY
S.year, S.product, S.profit DESC, S.country;
Result Set
+------+------------+---------+--------+-----------+
| year | product | country | profit | nth_value |
+------+------------+---------+--------+-----------+
| 2000 | Calculator | India | 75 | 75 |
| 2000 | Calculator | India | 75 | 75 |
| 2000 | Calculator | USA | 75 | 75 |
| 2000 | Computer | Finland | 1500 | 1500 |
| 2000 | Computer | USA | 1500 | 1500 |
| 2000 | Computer | India | 1200 | 1500 |
| 2000 | Phone | Finland | 100 | NULL |
| 2001 | Calculator | USA | 50 | NULL |
| 2001 | Computer | USA | 1500 | 1200 |
| 2001 | Computer | USA | 1200 | 1200 |
| 2001 | Phone | Finland | 10 | NULL |
| 2001 | TV | USA | 150 | 100 |
| 2001 | TV | USA | 100 | 100 |
+------+------------+---------+--------+-----------+

Every highlighted line in the result set above denotes the second row for a given partition. There are seven partitions in total (the second row profit value for each partition appears after the -> symbol below; if no second row is present for a partition, then NULL is reported):

2000 | Calculator -> 75
2000 | Computer -> 1500
2000 | Phone -> NULL
2001 | Calculator -> NULL
2001 | Computer -> 1200
2001 | Phone -> NULL
2001 | TV -> 100

Note how the second value of profit for the second row of each partition above is used for all values of the nth_value column for every row in the partition.

Application

TBD

LeetCode

TBD

NTILE()

Syntax
NTILE(literal_positive_integer) OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryDistributes the rows in an ordered partition into a specified number of groups or "buckets". The groups are numbered, starting at 1. For each row, NTILE returns the number of the group to which the row belongs.
literal_positive_integerLiteral positive integer that specifies the number of groups or "buckets" into which each partition must be divided. This expression cannot be a subquery; it must be a literal positive integer value (in MySQL at least). Bucket number return values range from 1 to literal_positive_integer.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Bucketing behaviorAs the SQL Server docs note, if the number of rows in a partition is not divisible by literal_positive_integer, then this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and the number of groups is five, then the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, then the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, then each bucket will contain 10 rows.
Function argumentWe have NTILE(literal_positive_integer) ... as the signature for this window function. The first and only argument, literal_positive_integer, is required.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query first partitions rows by product and then orders the rows within each partition by profit value (ascending). Finally, partition rows are further divided into two and four buckets, ntile2 and ntile4, respectively (row_num provides partition row numbers for reference).

Query
SELECT
S.*,
ROW_NUMBER() OVER w AS 'row_num',
NTILE(2) OVER w AS 'ntile2',
NTILE(4) OVER w AS 'ntile4'
FROM
sales S
WINDOW w AS (PARTITION BY S.product ORDER BY S.profit);
Result Set
+------+---------+------------+--------+---------+--------+--------+
| year | country | product | profit | row_num | ntile2 | ntile4 |
+------+---------+------------+--------+---------+--------+--------+
| 2001 | USA | Calculator | 50 | 1 | 1 | 1 |
| 2000 | USA | Calculator | 75 | 2 | 1 | 2 |
| 2000 | India | Calculator | 75 | 3 | 2 | 3 |
| 2000 | India | Calculator | 75 | 4 | 2 | 4 |
| 2001 | USA | Computer | 1200 | 1 | 1 | 1 |
| 2000 | India | Computer | 1200 | 2 | 1 | 1 |
| 2000 | Finland | Computer | 1500 | 3 | 1 | 2 |
| 2001 | USA | Computer | 1500 | 4 | 2 | 3 |
| 2000 | USA | Computer | 1500 | 5 | 2 | 4 |
| 2001 | Finland | Phone | 10 | 1 | 1 | 1 |
| 2000 | Finland | Phone | 100 | 2 | 2 | 2 |
| 2001 | USA | TV | 100 | 1 | 1 | 1 |
| 2001 | USA | TV | 150 | 2 | 2 | 2 |
+------+---------+------------+--------+---------+--------+--------+
  • The first highlighted block in the result set above illustrates the bucketing behavior remarked on in the table at the beginning of this section, specifically how buckets of different sizes are created when there is unequal distribution (with larger bucket sizes coming first).
  • The second highlighted block shows how it is not a problem to have too few rows to fill all buckets. Even though ntile4 specifies that four buckets should be used, there are only two rows to place into different buckets (hence we just have bucket numbers 1 and 2).

Application

TBD

LeetCode

TBD

PERCENT_RANK() and CUME_DIST()

How values are calculated

It is often hard to find practical guidance concerning how the PERCENT_RANK and CUME_DIST window functions calculate their values, especially in window specificiation terms. For the sake of clarity in the descriptions that follow, consider the sales table (i.e., the working data set) as having its rows partitioned by country and then its partition rows ordered by profit (ascending). This equates to the following in SQL terms:

[PERCENT_RANK|CUME_DIST]() OVER(PARTITION BY country ORDER BY profit)

The descriptions below, where the prc_ prefix stands for partition row count and is always in reference to the current row of the country partition for which a PERCENT_RANK|CUME_DIST value is being calculated, along with the details above, may help in visualizing how these functions calculate their values in different ways:

  • PERCENT_RANK()
    • prc_smaller_profit_preceding: How many rows preceding the current row have a smaller profit?
    • prc_minus_last_row: How many rows are there in the entire partition, excluding the last partition row (the last row is excluded from consideration because it contains the highest profit value in the partition)?
    • percent_rank (calculated value): prc_smaller_profit_preceding / prc_minus_last_row
  • CUME_DIST()
    • prc_frame: How many rows are there in the current row's window frame?
    • prc_total: How many rows are there in the entire partition?
    • cume_dist (calculated value): prc_frame / prc_total
Note about prc_frame

In this example, prc_frame values are calculated as follows (as shown in the result sets that follow this note widget):

COUNT(*) OVER (PARTITION BY S.country ORDER BY S.profit)

As noted in the frame clause section, the presence of ORDER BY in the window specification above means the default window frame is as follows for each row:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Since RANGE is the default frame unit (as opposed to ROWS), this means the frame will be defined by rows within a value range, where offsets (i.e., PRECEDING and FOLLOWING) are differences in row values from the current row — all rows are essentially grouped by value in a frame that uses RANGE as its units, where each group of rows by value is considered to be its own entity.

This may sound a little abstract, but the practical meaning becomes easier to understand in light of the following result set:

+---------+--------+---------+------------------+-------------------+-----------+---------------------+
| country | profit | row_num | prc_frame_w_rows | prc_frame_w_range | prc_total | cume_dist |
+---------+--------+---------+------------------+-------------------+-----------+---------------------+
| Finland | 10 | 1 | 1 | 1 | 3 | 0.3333333333333333 |
| Finland | 100 | 2 | 2 | 2 | 3 | 0.6666666666666666 |
| Finland | 1500 | 3 | 3 | 3 | 3 | 1 |
| India | 75 | 4 | 1 | 2 | 3 | 0.6666666666666666 |
| India | 75 | 5 | 2 | 2 | 3 | 0.6666666666666666 |
| India | 1200 | 6 | 3 | 3 | 3 | 1 |
| USA | 50 | 7 | 1 | 1 | 7 | 0.14285714285714285 |
| USA | 75 | 8 | 2 | 2 | 7 | 0.2857142857142857 |
| USA | 100 | 9 | 3 | 3 | 7 | 0.42857142857142855 |
| USA | 150 | 10 | 4 | 4 | 7 | 0.5714285714285714 |
| USA | 1200 | 11 | 5 | 5 | 7 | 0.7142857142857143 |
| USA | 1500 | 12 | 6 | 7 | 7 | 1 |
| USA | 1500 | 13 | 7 | 7 | 7 | 1 |
+---------+--------+---------+------------------+-------------------+-----------+---------------------+
  • prc_frame_w_range shows the partition row count when ROWS is being used as the frame unit:

    COUNT(*) OVER (PARTITION BY S.country ORDER BY S.profit ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS prc_frame_w_rows
  • prc_frame_w_range shows the partition row count when RANGE is being used as the frame unit (default behavior):

    COUNT(*) OVER (PARTITION BY S.country ORDER BY S.profit) AS prc_frame_w_range

The highlighted block shows how the rows in the USA partition that have the same profit value (i.e., 1500) also have the same prc_frame_w_range value (i.e., 7) — this is because all rows with the same profit value are considered to be grouped by value and to be in the same frame, along with all preceding rows:

RANGE BETWEEN  UNBOUNDED PRECEDINGrow count:  117+1=5row_num rows 711  AND  CURRENT ROWrow_num rows 1213row count:  1312+1=2prc_frame_w_range  =  5  +  2  =  7\texttt{RANGE BETWEEN}\; \underbrace{\overbrace{\underbrace{\texttt{UNBOUNDED PRECEDING}}_{\text{row count:}\; 11-7+1=5}}^{\text{\texttt{row\_num} rows \texttt{7}--\texttt{11}}}\; \texttt{AND}\; \underbrace{\overbrace{\texttt{CURRENT ROW}}^{\text{\texttt{row\_num} rows \texttt{12}--\texttt{13}}}}_{\text{row count:}\; 13-12+1=2}}_{\texttt{prc\_frame\_w\_range}\;=\;\texttt{5}\;+\;\texttt{2}\;=\;\texttt{7}}

The computations above, where RANGE is used, are very different from the computations where ROWS is used. Choosing ROWS in this example for the same frame range (i.e., BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) results in calculations equivalent to those produced by ROW_NUMBER(), as illustrated above in the prc_frame_w_rows column.

+---------+--------+------------------------------+--------------------+---------------------+
| country | profit | prc_smaller_profit_preceding | prc_minus_last_row | percent_rank |
+---------+--------+------------------------------+--------------------+---------------------+
| Finland | 10 | 0 | 2 | 0 |
| Finland | 100 | 1 | 2 | 0.5 |
| Finland | 1500 | 2 | 2 | 1 |
| India | 75 | 0 | 2 | 0 |
| India | 75 | 0 | 2 | 0 |
| India | 1200 | 2 | 2 | 1 |
| USA | 50 | 0 | 6 | 0 |
| USA | 75 | 1 | 6 | 0.16666666666666666 |
| USA | 100 | 2 | 6 | 0.3333333333333333 |
| USA | 150 | 3 | 6 | 0.5 |
| USA | 1200 | 4 | 6 | 0.6666666666666666 |
| USA | 1500 | 5 | 6 | 0.8333333333333334 |
| USA | 1500 | 5 | 6 | 0.8333333333333334 |
+---------+--------+------------------------------+--------------------+---------------------+
+---------+--------+-----------+-----------+---------------------+
| country | profit | prc_frame | prc_total | cume_dist |
+---------+--------+-----------+-----------+---------------------+
| Finland | 10 | 1 | 3 | 0.3333333333333333 |
| Finland | 100 | 2 | 3 | 0.6666666666666666 |
| Finland | 1500 | 3 | 3 | 1 |
| India | 75 | 2 | 3 | 0.6666666666666666 |
| India | 75 | 2 | 3 | 0.6666666666666666 |
| India | 1200 | 3 | 3 | 1 |
| USA | 50 | 1 | 7 | 0.14285714285714285 |
| USA | 75 | 2 | 7 | 0.2857142857142857 |
| USA | 100 | 3 | 7 | 0.42857142857142855 |
| USA | 150 | 4 | 7 | 0.5714285714285714 |
| USA | 1200 | 5 | 7 | 0.7142857142857143 |
| USA | 1500 | 7 | 7 | 1 |
| USA | 1500 | 7 | 7 | 1 |
+---------+--------+-----------+-----------+---------------------+

PERCENT_RANK()

Syntax
PERCENT_RANK() OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryCalculates the relative rank of a row within a group of rows; specifically, returns the percentage of partition values less than the value in the current row, excluding the highest value. More simply, the calculation generated by PERCENT_RANK for each row of a partition is equivalent to the number of rows in the partition that have a value less than the current row value (numerator) divided by the total number of rows in the partition excluding the last row which contains the highest value in the partition (denominator).

Essentially, PERCENT_RANK is used to evaluate the relative standing of a value within a query result set or partition, where return values range from 0 (always present) to 1 (only present when the highest partition value has no ties) and represent the row relative rank, calculated as the result of the following formula, where rank is the partition row rank and rows is the number of partition rows: (rank - 1) / (rows - 1). This formula roughly translates to the following using other window functions:

(RANK() OVER([partition_clause] ORDER BY [order_clause]) - 1) / (COUNT(*) OVER([partition_clause]) - 1)

This function should be used with ORDER BY to sort partition rows into the desired order. Without ORDER BY, all rows are peers. See the extended discussion following this section's basic example for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
MeaningConsider what it would mean for PERCENT_RANK to yield a calculation of 0.42 for a particular row of a partition. This would mean that the partition's current row value is higher than 42% of other row values in the partition (excluding the last row which contains the highest value).
Alternative calculation(RANK() OVER([partition_clause] ORDER BY [order_clause]) - 1) / (COUNT(*) OVER([partition_clause]) - 1)
Function argumentWe have PERCENT_RANK() OVER(...) as opposed to PERCENT_RANK(expression) OVER(...). There is no argument or expression to provide for this window function.
Frame clauseInvalid for this window function. See frame_clause section for more details.
How to remember how values are calculated

Sometimes rephrasing descriptions provided in official documentation can be helpful (or not; do what works best for you):

The calculation generated by PERCENT_RANK() OVER([partition_clause] ORDER BY [order_clause]) for each row of a partition is equivalent to the number of rows in the partition that have a value less than the current row value (numerator) divided by the total number of rows in the partition excluding the last row which contains the highest value in the partition (denominator).

Above, the value being referred to is given by [order_clause], which is most often an expression that makes direct reference to a table column. For concreteness, consider what it would mean for PERCENT_RANK to yield a calculation of 0.42 for a particular row of a partition. This would mean that the partition's current row value is higher than 42% of other row values in the partition (excluding the last row which contains the highest value).

How function values are calculated for PERCENT_RANK()

We are told that PERCENT_RANK values are calculated by means of the formula (rank - 1) / (rows - 1), which may be translated into actual SQL as follows:

(RANK() OVER([partition_clause] ORDER BY [order_clause]) - 1) 
/ (COUNT(*) OVER([partition_clause]) - 1)

The meaning of the denominator in the calculation above is fairly simple: COUNT(*) OVER([partition_clause]) gives us the total number of rows in the partition, and subtracting 1 ensures we exclude the partition's last row, which contains the highest value in the partition.

The numerator is not as straightforward. We can see that RANK() OVER([partition_clause] ORDER BY [order_clause]) ranks each row in a partition by the row ordering resulting from [order_clause], which is most often an expression making explicit reference to a column name, but we then subtract 1 from each row ranking. Why do we subtract 1? The answer is easiest to see by means of an example, but first recall what the numerator is supposed to mean: the number of rows in the partition that have a value less than the current row value.

Suppose we have a table vals with the following val entries:

1, 1, 3, 4, 4, 4, 6, 6, 7, 7

What would a numerator of RANK() OVER(ORDER BY val) - 1 mean in this case? See for yourself:

+------+-----------+
| val | numerator |
+------+-----------+
| 1 | 0 |
| 1 | 0 |
| 3 | 2 |
| 4 | 3 |
| 4 | 3 |
| 4 | 3 |
| 6 | 6 |
| 6 | 6 |
| 7 | 8 |
| 7 | 8 |
+------+-----------+
valnumerator explanation
1How many rows of the partition (in this case, the entire vals table is the partition) have a val less than the current row val of 1? Answer: 0.
1How many rows of the partition have a val less than the current row val of 1? Answer: 0.
3How many rows of the partition have a val less than the current row val of 3? Answer: 2.
4How many rows of the partition have a val less than the current row val of 4? Answer: 3.
4How many rows of the partition have a val less than the current row val of 4? Answer: 3.
4How many rows of the partition have a val less than the current row val of 4? Answer: 3.
6How many rows of the partition have a val less than the current row val of 6? Answer: 6.
6How many rows of the partition have a val less than the current row val of 6? Answer: 6.
7How many rows of the partition have a val less than the current row val of 7? Answer: 8.
7How many rows of the partition have a val less than the current row val of 7? Answer: 8.

There are two noteworthy points about the calculation behaviors exhibited above, both of which unsurprisingly revolve around how to best answer the question being asked for each row:

Calculation question: How many rows of the partition have a val less than the current row val?

  1. Consistency: A block of rows that have the same val should have the same answer to the calculation question above. This is achieved by RANK assigning the same ranking to all rows that have the same val.
  2. Ranking gaps: As noted in the point above, a block of rows that have the same val should have the same answer to the calculation question, but the next row block should take all members of the previous row block into account when answering the calculation question.

For example, there are three rows in the block of rows above with a val of 4, and the answer to the calculation question is the same for each row (i.e., answer of 3). The next row block should take each member of the previous row block into account when answering the calculation above — in this case, the answer should not be 4 but, "3 plus however many rows were in the previous row block", namely 3 + 3 = 6.

The fact that RANK produces gaps between rankings and then resumes rankings as if the values were sequential is what ensures correct values are produced in answer to the calculation question; that is, all rows with equivalent ranking criteria values get the same rank value, but the next rank value will take into account the number of all previous rows. Consider the very real-life example of college football rankings, specifically the AP poll from November 4, 2012:

November 4, 2012 AP Football Poll
+---------+-------------------+-----------------------------+
| ap_rank | team | num_teams_w_smaller_ap_rank |
+---------+-------------------+-----------------------------+
| 1 | Alabama | 0 |
| 2 | Oregon | 1 |
| 3 | Kansas State | 2 |
| 4 | Notre Dame | 3 |
| 5 | Ohio State | 4 |
| 5 | Georgia | 4 |
| 7 | Florida | 6 |
| 8 | Florida State | 7 |
| 9 | LSU | 8 |
| 10 | Clemson | 9 |
| 11 | Louisville | 10 |
| 12 | South Carolina | 11 |
| 13 | Oregon State | 12 |
| 14 | Oklahoma | 13 |
| 15 | Texas A&M | 14 |
| 16 | Stanford | 15 |
| 17 | UCLA | 16 |
| 18 | Nebraska | 17 |
| 19 | Louisiana Tech | 18 |
| 19 | Texas | 18 |
| 21 | USC | 20 |
| 22 | Mississippi State | 21 |
| 23 | Toledo | 22 |
| 24 | Rutgers | 23 |
| 25 | Texas Tech | 24 |
+---------+-------------------+-----------------------------+

The far right column is produced by RANK() OVER(ORDER BY R.ap_rank) - 1 and answers the question as to how many teams are ranked higher than the current team under consideration (i.e., the current row in this context). The highlighted lines show where ties occurred and illustrates how ranks were assigned within a block of ties and subsequently resumed (i.e., by taking the gaps into account).

Basic example

Using the working data set, the following query first partitions rows by country and then sorts rows within each partition by profit value (in ascending order). Percent ranks are then computed for each row of each partition. The row number row is included only as a reference point in the explanation that follows the result set.

Query
SELECT
ROW_NUMBER() OVER(ORDER BY S.country, S.profit) AS 'row',
S.*,
PERCENT_RANK() OVER(PARTITION BY S.country ORDER BY S.profit) AS 'percent_rank'
FROM
sales S;
Result Set
+-----+------+---------+------------+--------+---------------------+
| row | year | country | product | profit | percent_rank |
+-----+------+---------+------------+--------+---------------------+
| 1 | 2001 | Finland | Phone | 10 | 0 |
| 2 | 2000 | Finland | Phone | 100 | 0.5 |
| 3 | 2000 | Finland | Computer | 1500 | 1 |
| 4 | 2000 | India | Calculator | 75 | 0 |
| 5 | 2000 | India | Calculator | 75 | 0 |
| 6 | 2000 | India | Computer | 1200 | 1 |
| 7 | 2001 | USA | Calculator | 50 | 0 |
| 8 | 2000 | USA | Calculator | 75 | 0.16666666666666666 |
| 9 | 2001 | USA | TV | 100 | 0.3333333333333333 |
| 10 | 2001 | USA | TV | 150 | 0.5 |
| 11 | 2001 | USA | Computer | 1200 | 0.6666666666666666 |
| 12 | 2001 | USA | Computer | 1500 | 0.8333333333333334 |
| 13 | 2000 | USA | Computer | 1500 | 0.8333333333333334 |
+-----+------+---------+------------+--------+---------------------+

The table below shows how the percent_rank value is computed for each row.

Rowpercent_rank calculation explanation
1How many rows in the Finland partition have a profit value less than the current row profit value of 10? Answer: 0. How many rows are there in the Finland partition excluding the row with the highest profit value? Answer: 2. Thus, the percent_rank calculation for this row is as follows:

0/10=00/10=0
2How many rows in the Finland partition have a profit value less than the current row profit value of 100? Answer: 1 (i.e., the previous row with a profit value of 10). How many rows are there in the Finland partition excluding the row with the highest profit value? Answer: 2. Thus, the percent_rank calculation for this row is as follows:

1/2=0.51/2=0.5
3How many rows in the Finland partition have a profit value less than the current row profit value of 1500? Answer: 2 (i.e., previous rows with profit values of 10 and 100). How many rows are there in the Finland partition excluding the row with the highest profit value? Answer: 2. Thus, the percent_rank calculation for this row is as follows:

2/2=12/2=1
4How many rows in the India partition have a profit value less than the current row profit value of 75? Answer: 0. How many rows are there in the India partition excluding the row with the highest profit value? Answer: 2. Thus, the percent_rank calculation for this row is as follows:

0/2=00/2=0
5How many rows in the India partition have a profit value less than the current row profit value of 75? Answer: 0. How many rows are there in the India partition excluding the row with the highest profit value? Answer: 2. Thus, the percent_rank calculation for this row is as follows:

0/2=00/2=0
6How many rows in the India partition have a profit value less than the current row profit value of 1200? Answer: 2 (i.e., previous rows with a profit value of 75). How many rows are there in the India partition excluding the row with the highest profit value? Answer: 2. Thus, the percent_rank calculation for this row is as follows:

2/2=12/2=1
7How many rows in the USA partition have a profit value less than the current row profit value of 50? Answer: 0. How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

0/6=00/6=0
8How many rows in the USA partition have a profit value less than the current row profit value of 75? Answer: 1 (i.e., the previous row with a profit value of 50). How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

1/60.16671/6\approx0.1667
9How many rows in the USA partition have a profit value less than the current row profit value of 100? Answer: 2 (i.e., previous rows with profit values of 50 and 75). How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

2/6=1/30.33332/6=1/3\approx0.3333
10How many rows in the USA partition have a profit value less than the current row profit value of 150? Answer: 3 (i.e., previous rows with profit values of 50, 75, and 100). How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

3/6=1/20.53/6=1/2\approx0.5
11How many rows in the USA partition have a profit value less than the current row profit value of 1200? Answer: 4 (i.e., previous rows with profit values of 50, 75, 100, and 150). How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

4/6=2/30.66674/6=2/3\approx0.6667
12How many rows in the USA partition have a profit value less than the current row profit value of 1500? Answer: 5 (i.e., previous rows with profit values of 50, 75, 100, 150, and 1200). How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

5/60.83335/6\approx0.8333
13How many rows in the USA partition have a profit value less than the current row profit value of 1500? Answer: 5 (i.e., previous rows with profit values of 50, 75, 100, 150, and 1200). How many rows are there in the USA partition excluding the row with the highest profit value? Answer: 6. Thus, the percent_rank calculation for this row is as follows:

5/60.83335/6\approx0.8333

Extended discussion

This is one of the more complicated window functions. Not just because of how calculations are made but also because it takes some work to interpret what the calculations actually mean. The basic example above attempts to address both of these concerns, but additional examples (i.e., this "extended discussion") can go a long way in answering unresolved questions.

Example 1 (basic example with only percent_rank)

Compute the percentile rank for each row's profit value when rows are partitioned by country value and rows are ordered within each partition by profit (ascending). Use the data from the sales table (i.e., the working data set). Report all rows from the sales table in the result set along with each row's percentile rank, percentile_rank as described above.

Note: This is just the basic example for this window function but recast as a problem.

Example 2 (basic example fleshed out)

Recall from the description of the PERCENT_RANK function that the following is an alternative way to make equivalent row calculations to those of PERCENT_RANK:

(RANK() OVER(ORDER BY expr) - 1) / (COUNT(*) OVER() - 1)

The numerator may be referred to as partition_row_rank, and the denominator is the partition_row_count (excluding the final row with the highest value in the partition). The quotient, which makes use of both, can be referred to as percent_rank_w_formula (i.e., the percent rank calculated by means of a formula).

Problem: Compute the percentile rank, percent_rank, for each row's profit value when rows are partitioned by country value and rows are ordered within each partition by profit (ascending). Use the data from the sales table (i.e., the working data set). Report for each row the country, profit, partition_row_rank, partition_row_count, percent_rank_w_formula, and percent_rank, as described above.

Note: This is just the basic example for this window function but recast as a problem that requires more granular details.

Application

TBD

LeetCode

TBD

CUME_DIST()

Syntax
CUME_DIST() OVER (
[window_name]
[partition_clause]
[order_clause]
)
CATEGORYDESCRIPTION
SummaryCalculates the cumulative distribution of a value within a group of values (i.e., returns the percentage of partition values less than or equal to the value in the current row). In other words, CUME_DIST calculates the relative position of a specified value in a group of values. The calculation represents the number of rows preceding or peer with the current row in the window ordering of the window partition (i.e., window frame) divided by the total number of rows in the window partition. Return values, which represent relative row positioning, range from 0 to 1 and are calculated as the result of the following formula:

partition_frame_row_count / partition_total_row_count

This formula roughly translates to the following in SQL:

COUNT(*) OVER([partition_clause] ORDER BY [order_clause]) / COUNT(*) OVER([partition_clause])

This function should be used with ORDER BY to sort partition rows into the desired order. Without ORDER BY, all rows are peers and have value N/N = 1, where N is the partition size.. See the discussion following this section's basic example for more details.
window_nameOptional. See the window_name section for more information.
partition_clauseOptional. See the partition_clause section for more information.
order_clauseOptional. See the order_clause section for more information.
Value calculationsCOUNT(*) OVER([partition_clause] ORDER BY [order_clause]) / COUNT(*) OVER([partition_clause]) is a formula that could also be used to achieve the same behavior as CUME_DIST([partition_clause] ORDER BY [order_clause]).
Function argumentWe have CUME_DIST() OVER(...) as opposed to CUME_DIST(expression) OVER(...). There is no argument or expression to provide for this window function.
Frame clauseInvalid for this window function. See frame_clause section for more details.

Basic example

Using the working data set, the following query first partitions rows by country and then sorts rows within each partition by profit value (in ascending order). Cumulative distribution values are then computed for each row of each partition.

Query
SELECT
S.*,
CUME_DIST() OVER(PARTITION BY S.country ORDER BY s.profit) AS 'cume_dist'
FROM
sales S;
Result Set
+------+---------+------------+--------+---------------------+
| year | country | product | profit | cume_dist |
+------+---------+------------+--------+---------------------+
| 2001 | Finland | Phone | 10 | 0.3333333333333333 |
| 2000 | Finland | Phone | 100 | 0.6666666666666666 |
| 2000 | Finland | Computer | 1500 | 1 |
| 2000 | India | Calculator | 75 | 0.6666666666666666 |
| 2000 | India | Calculator | 75 | 0.6666666666666666 |
| 2000 | India | Computer | 1200 | 1 |
| 2001 | USA | Calculator | 50 | 0.14285714285714285 |
| 2000 | USA | Calculator | 75 | 0.2857142857142857 |
| 2001 | USA | TV | 100 | 0.42857142857142855 |
| 2001 | USA | TV | 150 | 0.5714285714285714 |
| 2001 | USA | Computer | 1200 | 0.7142857142857143 |
| 2001 | USA | Computer | 1500 | 1 |
| 2000 | USA | Computer | 1500 | 1 |
+------+---------+------------+--------+---------------------+

See the how values are calculated section for more details on how the values above are calculated and how they compare to the similar PERCENT_RANK() window function.

Application

TBD

LeetCode

TBD

Further reading