Query Execution Order
TLDR
The logical processing order of the SELECT
statement is generally as follows:
FROM/JOIN
(and all associatedON
conditions)WHERE
GROUP BY
HAVING
SELECT
(including window functions)DISTINCT
ORDER BY
LIMIT/OFFSET
Motivation
Without hesitation, compute the following as quickly as you can in your head:
1 + 2 / 3 * 4 ^ 5 - 6
This immediately comes off as a "trick PEMDAS problem," where the goal is to "test" whether or not someone "really" understands the order of operations for evaluating mathematical expressions. But you would never encounter an expression like the one above in the wild. Why? Because it's been intentionally made unclear as to how to go about evaluating the given expression. Specifically, the complete absence of parentheses makes it unnecessarily difficult to understand what the result of the mathematical expression should be.
Importance
Trying to write a SQL SELECT
statement without understanding its logical processing order is like trying to evaluate a complicated mathematical expression with no parentheses. It's unnecessarily difficult, and it can lead to uncertainty in what the result set for a query should be.
Documentation
If you search for order of execution of a SQL query, then you are likely to stumble across threads on Stack Overflow, and who knows what else. But there is little in the way of "official documentation" except the following helpful excerpt from the SQL Server docs
The following steps show the logical processing order, or binding order, for a
SELECT
statement. This order determines when the objects defined in one step are made available to the clauses in subsequent steps. For example, if the query processor can bind to (access) the tables or views defined in theFROM
clause, these objects and their columns are made available to all subsequent steps. Conversely, because theSELECT
clause is step 8, any column aliases or derived columns defined in that clause cannot be referenced by preceding clauses. However, they can be referenced by subsequent clauses such as theORDER BY
clause. The actual physical execution of the statement is determined by the query processor and the order may vary from this list.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE
orWITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
This excerpt ends with a warning:
The preceding sequence is usually true. However, there are uncommon cases where the sequence may differ.
Mental model
For the sake of simplicity, the list above may be expressed in the following manner to capture the (usually true) logical processing order of the SELECT
statement:
FROM/JOIN
(and all associatedON
conditions)WHERE
GROUP BY
HAVING
SELECT
(including window functions)DISTINCT
ORDER BY
LIMIT/OFFSET
The listing above is how you should think about a SELECT
query, but this is different from how you would write a SELECT
query. How do we typically write a SELECT
query?
Writing model
If we were to hop over to the Postgres docs, then we might be a little overwhelmed by what's provided for the "synopsis" of the SELECT
statement (reproduced below).
SELECT
statement synopsis (Postgres)
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
[ * | expression [ [ AS ] output_name ] [, ...] ]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]
[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
where from_item can be one of:
[ ONLY ] table_name [ * ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ] ]
[ LATERAL ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ]
with_query_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ LATERAL ] function_name ( [ argument [, ...] ] )
[ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
[ LATERAL ] function_name ( [ argument [, ...] ] ) [ AS ] alias ( column_definition [, ...] )
[ LATERAL ] function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] )
[ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) [ AS ( column_definition [, ...] ) ] [, ...] )
[ WITH ORDINALITY ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
from_item join_type from_item { ON join_condition | USING ( join_column [, ...] ) [ AS join_using_alias ] }
from_item NATURAL join_type from_item
from_item CROSS JOIN from_item
and grouping_element can be one of:
( )
expression
( expression [, ...] )
ROLLUP ( { expression | ( expression [, ...] ) } [, ...] )
CUBE ( { expression | ( expression [, ...] ) } [, ...] )
GROUPING SETS ( grouping_element [, ...] )
and with_query is:
with_query_name [ ( column_name [, ...] ) ] AS [ [ NOT ] MATERIALIZED ] ( select | values | insert | update | delete )
[ SEARCH { BREADTH | DEPTH } FIRST BY column_name [, ...] SET search_seq_col_name ]
[ CYCLE column_name [, ...] SET cycle_mark_col_name [ TO cycle_mark_value DEFAULT cycle_mark_default ] USING cycle_path_col_name ]
TABLE [ ONLY ] table_name [ * ]
We can produce a more useful writing model by stripping the general SELECT
statement down to its essentials:
-- writing model for a general SELECT statement
SELECT DISTINCT
column, AGG_FUNC(column_or_expression), ...
FROM
mytable M JOIN another_table A ON M.column = A.column
WHERE
constraint_expression
GROUP BY
column
HAVING
constraint_expression
ORDER BY
column ASC/DESC
LIMIT
returnCount OFFSET skipCount;
Tabular summary
Order | Operation | Can use column alias? | Note |
---|---|---|---|
1 | FROM /JOIN | ❌ | Declare table aliases here and FROM what table (or combined tables when using one or more JOIN s) you will be selecting data; the total working set of data is determined here--the resultant working set of data may be thought of as a "virtual table" to which all subsequent operations will apply (this is the idea behind a view which is essentially a commonly used working set of data or "virtual table" created by a query, often a somewhat complicated one, whose result set is given a name (i.e., the name of the view); subsequent queries can then be easily written to target this predetermined working set of data, virtual table, named query, view or whatever you want to call it--such a construct is most often called a view) |
2 | WHERE | ❌ | Apply first-pass constraints to individual rows |
3 | GROUP BY | ❌ | Group remaining rows (i.e., the rows remaining after the first-pass WHERE constraints have filtered out all individual rows not meeting the specified criteria) based on common values in specified column(s); be mindful when using more than one field in this clause, namely GROUP BY ColOrExpr means put all those rows with the same value for ColOrExpr in one group while something like GROUP BY ColOrExpr1, ..., ColOrExprN means put all those rows with the same values for all N columns or expressions in one group |
4 | HAVING | ❌ | Apply constraints to an aggregate or a group of rows that have been GROUPed BY some criteria; essentially, WHERE applies to individual rows while HAVING applies to groups of rows |
5 | SELECT | ― | Declare column aliases here |
6 | DISTINCT | ― | Use a more restrictive version of SELECT by specifying which rows should be considered duplicates, and thus removed from the result set, by using the syntax SELECT DISTINCT which generally takes one of two forms:
|
7 | ORDER BY | ✅ | Sort a query result set by using the syntax ORDER BY SortExpression1 [ASC | DESC], ..., SortExpressionN [ASC | DESC] where SortExpression can be either a column or an expression and ASC | DESC specifies whether to sort the column or expression in ASC ending order (default) or in DESC ending order; note that you can sort a result set by multiple columns and/or expressions where each listed SortExpression takes precedence over the next (e.g., something like ORDER BY LastName DESC, Age ASC would order rows first by LastName in descending order and then, within that ordered set, sort the remaining results by AGE in ascending order) |
8 | LIMIT /OFFSET | ✅ | Return a (possibly offset) subset of rows generated by a query; the syntax LIMIT RowsToReturn OFFSET RowsToSkip indicates the maximum number of rows that should be returned by a query (i.e, RowsToReturn ) as well as how many rows to skip from the initial query results (i.e., RowsToSkip ); almost always use ORDER BY when using LIMIT/OFFSET to ensure a predictable result set--this is due to the fact that the order in which records are returned from a database is often random unless order is imposed by means of ORDER BY |
List summary
FROM/JOIN
: TheFROM
clause and subsequentJOIN
s are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause, and can cause temporary tables to be created under the hood containing all the columns and rows of the tables being joined.WHERE
: Once we have the total working set of data, the first-passWHERE
constraints are applied to the individual rows. The rows that do not satisfy the constraint are discarded. Each of the constraints can only access columns directly from the tables requested in theFROM
clause. Aliases in theSELECT
part of the query are not accessible in most databases since they may include expressions dependent on parts of the query that have not yet executed.GROUP BY
: The remaining rows after theWHERE
constraints are applied are then grouped based on common values in the column(s) specified in theGROUP BY
clause. As a result of the grouping, there will only be as many rows as there are unique values in that column. Implicitly, this means that you should only need to use this when you have aggregate functions in your query.HAVING
: If the query has aGROUP BY
clause, then the constraints in theHAVING
clause are then applied to the grouped rows, discarding the grouped rows that do not satisfy the constraint(s). Like theWHERE
clause, aliases are also not accessible from this step in most databases. Finally, just remember that theWHERE
clause is to individual rows whatHAVING
is to grouped rows by means ofGROUP BY
(i.e.,HAVING
is the multi-row version ofWHERE
).SELECT
: Any expressions in theSELECT
part of the query are finally computed.DISTINCT
: Of the remaining rows, rows with duplicate values in the column(s) marked asDISTINCT
will be discarded.ORDER BY
: If an order is specified by theORDER BY
clause, the rows are then sorted by the specified data in either ascending or descending order. Since all the expressions in theSELECT
part of the query have been computed, you can reference aliases in this clause.LIMIT/OFFSET
: Finally, the rows that fall outside the range specified by theLIMIT
andOFFSET
are discarded, leaving the final set of rows to be returned from the query.