Chapter 32 – Searching with Set Operators¶
Note
You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.
For more information specific to CrateDB, check out the CrateDB Reference.
SQL supports three set operators: UNION
, EXCEPT
and INTERSECT
. We
described their general effects in our discussion of set theory (see our
chapter on “General Concepts”). Here’s a quick recap, using two Tables:
TABLE_1
contains these three, one-Column rows: {1,2,3
} and TABLE_2
contains these three, one- Column rows: {1,3,5
}.
The expression:
(Table_1) UNION (Table_2)
yields all non-duplicate rows from both TABLE_1
and TABLE_2
, so the
result is these four rows: {1,2,3,5
}. The expression:
(Table_1) EXCEPT (Table_2)
yields all rows that are in TABLE_1
and are not also in TABLE_2
, so the
result is this row: {2
}. The expression:
(Table_1) INTERSECT (Table_2)
yields all rows that are in TABLE_1
and are also in TABLE_2
, so the
result is these two rows: {1,3
}
The result in all three cases is, of course, another Table: UNION
and
EXCEPT
and INTERSECT
take two Tables as input, and produce one result
Table as output (that’s why some sources – particularly FIPS – refer to
UNION
and EXCEPT
and INTERSECT
as “Table operators” rather than
“set operators”) – and so these constructs can also be used as a <Table
reference> in an SQL statement.
Table of Contents
<query expression>¶
In this chapter, we’ll discuss both the implementation problems and the details for the three set operators. You’ll notice that the list of implementation-specific restrictions is quite long, which reflects the difficulty that DBMS vendors have supporting set operators. Before we begin our discussion though, we want to emphasize that each of the set operators can be used to form what the SQL Standard calls a <query expression> – that is, an expression that results in a Table. Here is the full syntax required for a <query expression>:
<query expression> ::=
[ WITH [ RECURSIVE ] <with list> ] <query expression body>
<with list> ::=
<with list element> [ {,<with list element> } ... ]
<with list element> ::=
<query name> [ (<Column name list>) ] AS (<query expression> )
[ <search or cycle clause> ]
<query expression body> ::=
<non-join query expression> | <joined table>
<non-join query expression> ::=
<non-join query term> |
<query expression> UNION [ ALL | DISTINCT ]
[ <corresponding spec> ] <query term> |
<query expression> EXCEPT [ ALL | DISTINCT ]
[ <corresponding spec> ] <query term>
<query term> ::=
<non-join query term> |
<joined table>
<non-join query term> ::=
<non-join query primary> |
<query term> INTERSECT [ ALL | DISTINCT ]
[ <corresponding spec> ] <query primary>
<query primary> ::=
<non-join query primary> |
<joined table>
<non-join query primary> ::=
<simple table> |
(<non-join query expression>)
<simple table> ::=
<query specification> |
VALUES (<row value constructor>s) |
TABLE <Table name>
<corresponding spec> ::=
CORRESPONDING [ BY (<Column name list>) ]
<Column name list> ::=
<Column name> [ {,<Column name>} ...]
<search or cycle clause> ::=
<search clause> |
<cycle clause> |
<search clause> <cycle clause>
<search clause> ::=
SEARCH <recursive search order> SET <Column name>
<recursive search order> ::=
DEPTH FIRST BY <sort specification list> |
BREADTH FIRST BY <sort specification list>
<cycle clause> ::=
CYCLE <cycle column list>
SET <Column name> TO <cycle mark value>
DEFAULT <non-cycle mark value> USING <Column name>
<cycle column list> ::=
<Column name> [ {,<Column name>}... ]
<cycle mark value> ::= <value expression>
<non-cycle mark value> ::= <value expression>
Although this syntax diagram looks extremely complicated, don;t be too
discouraged! Most SQL queries are fairly simple SELECT
statements, and
those are mostly the types of examples we’ll show you. And, in this chapter,
we’re just going to concentrate on the set operators anyway. However, a brief
description of some of the terms in the diagram is warranted.
UNION
andEXCEPT
may appear in a <non-join query expression> andINTERSECT
may appear in a <non-join query term>. In none of the cases are the input Columns involved allowed to have aBLOB
,CLOB
orNCLOB
<data type>.A <query term> is either a <joined table> (see our chapter on “Searching with Joins”) or a <non-join query term>.
A <non-join query term> is either a <non-join query primary> or a <query term> that uses
INTERSECT
.You use a <query primary> to specify the second Table operand of an
INTERSECT
operation. A <query primary> is either a <joined table>, a parenthesized <non-join query expression>, aSELECT
statement (i.e.,SELECT
…FROM
…WHERE
…), a Table constructor, orTABLE
<Table name> (which evaluates toSELECT * FROM
<Table name>). If you want to restrict your code to Core SQL, don’t useTABLE
<Table name>.A Table constructor is the <keyword>
VALUES
followed by a comma- delimited, parenthesized list of rows or an expression that evaluates to a Table. Here’s an example of the first case:... VALUES (10,'hello',B'1011'), (20,'goodbye',B'1111'), (30,'bobby',B'0000') ...
is a Table constructor that constructs a three-Column Table with three rows. (Each value inside the
VALUES
clause may be any expression that evaluates to a single value.) Here’s another example, this time of the second case:... VALUES (SELECT column_1, column_5 FROM Table_1 WHERE column_1=10) ...
is a Table constructor that constructs a two-Column Table.
If you want to restrict your code to Core SQL, don’t use a Table constructor
that involves a Table expression, don’t use a Table constructor anywhere but in
an INSERT
statement and make all your Table constructors contain exactly
one row.
Set Operation Syntax¶
The required syntax for the simplest form of a set operation is:
<query expression> {UNION | EXCEPT | INTERSECT} <query expression>
Here’s an example of each:
SELECT boss_name, salary FROM Bosses
UNION
SELECT employee_name, salary FROM Employees;
SELECT boss_name, salary FROM Bosses
EXCEPT
SELECT employee_name, salary FROM Employees;
SELECT boss_name, salary FROM Bosses
INTERSECT
SELECT employee_name, salary FROM Employees;
Despite the complicated <query expression> syntax we showed you earlier, assume
that <query expression> means “SELECT
…” or “VALUES
…” or
“TABLE
<Table name>” — we want deliberately to avoid some complications
of the official syntax description. For now, it is enough to say that there are
slight restrictions on the type of <query expression> that can be an argument
for UNION
or EXCEPT
and slightly different restrictions for
INTERSECT
. These restrictions are obscure and have little or no practical
effect.
In the previous examples, the first SELECT
statement retrieves two Columns
(BOSS_NAME
, SALARY
) and so does the second SELECT
statement
(EMPLOYEE_NAME
, SALARY
). This reflects a mandatory rule — each input
Table must have the same number of Columns and each corresponding pair of
Columns must have comparable <data type>s. This is because each pair of Columns
will merge into a single Column of the result Table — that is, BOSS_NAME
and EMPLOYEE_NAME
will merge into one result Column and so will SALARY
and SALARY
. Usually, your DBMS decides which Column to merge with which
Column by looking at ordinal position; it merges the first Columns from each
input Table, then the second Columns, and so on. Merging by ordinal position is
the simplest method, but there are alternative ways, which we’ll discuss later.
First, though, we’ll add a bit more to our basic set operation syntax.
ALL | DISTINCT¶
The required syntax for a slightly more complicated form of a set operation is:
<query expression>
{UNION | EXCEPT | INTERSECT}
[ ALL | DISTINCT ]
<query expression>
Although the default for all three set operations is duplicate elimination, you
can instruct your DBMS otherwise. The optional <keyword> ALL
after
UNION
, EXCEPT
, or INTERSECT
means “keep all duplicates,” while the
optional <keyword> DISTINCT
means “eliminate duplicates.” In SQL-92, the
<keyword> DISTINCT
was not valid syntax, but it has always been the default
— that is, if you do not explicitly say ALL
, then the DBMS assumes that
you want to eliminate duplicates. Assume that TABLE_1
contains these five,
one-Column rows: {0,1,2,2,3
} and TABLE_2
contains these five,
one-Column rows: {1,2,3,5,5
}.
The expressions:
(Table_1) UNION (Table_2)
(Table_1) UNION DISTINCT (Table_2)
both yield all non-duplicate rows from TABLE_1
and TABLE_2
:
{0,1,2,3,5
}. The expression:
(Table_1) UNION ALL (Table_2)
yields all rows from TABLE_1
and TABLE_2
: {0,1,2,2,2,3,3,5,5
}. The
expressions:
(Table_1) EXCEPT (Table_2)
(Table_1) EXCEPT DISTINCT (Table_2)
both yield all non-duplicate rows that are in TABLE_1
and are not also in
TABLE_2
: {0
}. The expression:
(Table_1) EXCEPT ALL (Table_2)
yields all rows that are in TABLE_1
and are not also in TABLE_2
:
{0,2
}. (For EXCEPT ALL
, if a row appears x times in the first Table
and y times in the second Table, it will appear z times in the result Table
(where z is x - y or zero, whichever is greater). The expressions:
(Table_1) INTERSECT (Table_2)
(Table_1) INTERSECT DISTINCT (Table_2)
both yield all non-duplicate rows that are in TABLE_1
and are also in
TABLE_2
: {1,2,3
} The expression:
(Table_1) INTERSECT ALL (Table_2)
yields all rows that are in TABLE_1
and are also in TABLE_2
:
{1,2,3
}. (For INTERSECT ALL
, if a row appears x times in the first
Table and y times in the second Table, it will appear z times in the result
Table (where z is the lesser of x and y).
That is, when you use an expression like “SELECT
… <set operator>
SELECT
…,” the effect is the same as if you’d used “SELECT DISTINCT
… <set operator> SELECT DISTINCT
…” When you use an expression like
“SELECT
… <set operator> ALL SELECT
…,” then any duplicates are
paired off against each other, one by one.
Caution
The [ALL | DISTINCT]
option also appears at the beginning of a SELECT
expression (i.e., SELECT [ALL|DISTINCT]
<select list> FROM
<Table-reference> …). But in that case, the default is ALL
rather than
DISTINCT
. By contrast, after a set operator, the default is DISTINCT
.
There’s an inconsistency here — maybe the original assumption was that when
you SELECT
you don’t need DISTINCT
, but when you UNION
you do
need DISTINCT
.
Probably UNION ALL
is a faster operation than UNION DISTINCT
, but for
theoretical reasons you shouldn’t go out of your way to preserve duplicate
rows. Use the ALL
option only for those cases where duplicates don’t
matter, where duplicates won’t happen anyway or where the result of UNION
ALL
is only going to be used for summary-information purposes.
For the purposes of set operation, two NULL
s are “duplicates” of each
other. Now let’s expand our set operation syntax a little more.
CORRESPONDING¶
The required syntax for the final form of a set operation is:
<query expression>
{UNION | EXCEPT | INTERSECT}
[ ALL | DISTINCT ]
[ CORRESPONDING [ BY (<Column name list>) ] ]
<query expression>
In a well-designed database, when Tables have similar Columns, the Columns have
similar names. For example, suppose two Tables, VILLAS
and MANSIONS
,
are defined with these CREATE TABLE
statements:
CREATE TABLE Villas (
county char(20),
acreage DECIMAL(4,2),
price DECIMAL(8));
CREATE TABLE Mansions (
owner char(20),
acreage DECIMAL(4,2),
house_rating INT,
price DECIMAL(8));
Notice that two <Column name>s – ACREAGE
and PRICE
– are in both
Tables, although the Columns they represent don’t occupy the same ordinal
positions. Now, if we want a list of properties which are both villas and
mansions, we can use this simple intersection:
SELECT acreage, price FROM Villas
INTERSECT
SELECT acreage, price FROM Mansions;
But such a query is lots of work. To formulate it, we must look at the
definitions for each Table, figure out which Columns they have in common and
then list those Columns, in the right order, in our query. It would be far
simpler to be able to say: “select Columns with the same names”. The optional
CORRESPONDING
clause for a set operation does just that. Here’s an example
that is equivalent to the last one:
SELECT * FROM Villas
INTERSECT CORRESPONDING
SELECT * FROM Mansions;
The <keyword> CORRESPONDING
, used alone after a set operator, means
“instead of going by Column position as in the ‘simple’ format, merge all those
Columns which have the same name, regardless of their position”. In this case,
the Columns which have the same name in both Tables are ACREAGE
and
PRICE
, so those are the Columns that will appear in the result Table.
Now suppose that we only wanted the ACREAGE
Column in our result – we’re
not interested in the PRICE
at all. Since CORRESPONDING
, by itself,
automatically merges every pair of Columns with the same name, we can’t use it
to form a quick query for the answer we want. Frankly, the easy solution is to
return to the simple format and say:
SELECT acreage FROM Mansions
INTERSECT
SELECT acreage FROM Villas;
But let’s pretend that we have some reason to avoid using a SELECT
in this
situation. There are alternatives, one of which is the expression “TABLE
<Table-name>”, so let’s consider this expression:
TABLE Mansions
INTERSECT CORRESPONDING
TABLE Villas;
which gives us two Columns – ACREAGE
and PRICE
– and now let’s
consider this expression:
TABLE Mansions
INTERSECT CORRESPONDING BY (acreage)
TABLE Villas;
which gives us one Column – ACREAGE
. It’s a contrived example, but if we
assume that the query is not SELECT
, then “INTERSECT CORRESPONDING BY
...
” is the simplest way to use a set operator and merge specific same-name
Columns.
The rules for this operation are:
If
CORRESPONDING BY
is used with a set operator, at least one <Column name> must be in the comma-delimited and parenthesizedBY
list and each of the <Column name>s in the list must identify a Column that appears in both input Tables. The number of Columns in the result Table will equal the number of Columns in theBY
list.If
CORRESPONDING
is used with a set operator, you may not specify a Column list. The number of Columns in the result Table will equal the number of Columns that both Tables have in common.If you omit the
CORRESPONDING
clause entirely, the number of Columns in the result Table will equal the number of Columns that you explicitly specify in your query.
To sum it up, there are three ways to pick which Columns to merge with a set
operation: (a) merge “all” Columns by ordinal position (the simple format),
(b) merge “all Columns with the same name” (the CORRESPONDING
format) and
(c) merge “specified Columns with the same name (the CORRESPONDING BY
format).
Result Names and ORDER BY¶
When you’re doing a set operation, if the <Column name>s of two merged Columns
are the same, then the merged result Column has that name too. If the <Column
name>s of two merged Columns are not the same, then, by default, the merged
Column is given some unique temporary name by your DBMS. Why would anyone care
what the name of a merged Column is? Well, if you want the rows in the result
Table to come out in some specific order, you’ll have to use an ORDER BY
clause — which can only operate on the result Table and that means the names
of the result Table’s Columns must be known. An ORDER BY
clause may
optionally appear after a <query expression>; here is the required syntax:
<query expression> [ ORDER BY <sort specification list> ]
<sort specification list> ::=
<sort specification> [ {,<sort specification> }. . . ]
<sort specification> ::=
<sort key> [ COLLATE <Collation name> ] [ {ASC | DESC} ]
<sort key> ::= scalar_expression
The ORDER BY
clause provides your DBMS with a list of items to sort and the
order in which to sort them; either ascending order (ASC
, the default) or
descending order (DESC
). It must follow a <query expression> — this means
it operates on the final Table that results from the evaluation of the query;
it cannot operate on any interim result Tables at all. The <sort key> you
specify can be any expression that evaluates to a single value that identifies
a Column of the final result Table — this is almost always a <Column name>
and thus our concern to know what name will result from a set operation.
Suppose, then, you want to specify a sort order for the result of our earlier examples:
SELECT boss_name, salary FROM Bosses
UNION
SELECT employee_name, salary FROM Employees;
In this case, it isn’t possible to add an ORDER BY
clause that specifies
“ORDER BY boss_name
” or “ORDER BY employee_name
” at the end of the
query – these names identify the original Columns, not the merged result
Column. The name of the merged result Column is defined by the Standard to be
implementation-dependent: that is, it will be a name provided by your DBMS, but
what name will be provided doesn’t even have to be documented; it’s considered
to be an internal thing.
In SQL-92, you can solve this dilemma with an ORDER BY
clause that
specifies “ORDER BY 1
” – this instructs the DBMS to sort by the first
result Column, no name required. This, however, is not legal in SQL3, so you’ll
have to force your own names for the result Columns by providing an AS
clause in each select list for corresponding Columns and then using that name
in the ORDER BY
clause. Here’s an example:
SELECT boss_name AS sort_name, salary FROM Bosses
UNION
SELECT employee_name AS sort_name, salary FROM Employees
ORDER BY sort_name;
With this syntax, you’re forcing the corresponding Columns in each input Table
to have the same name. This name is thus the name given to the merged result
Column and can therefore be used in an ORDER
by clause to identify that
result Column.
[Obscure Rule] The Collation of a merged Column with a character string <data
type> will depend on the coercibility characteristics of the two input Columns,
and whether you add the optional COLLATE
specification or not. If you want
to restrict your code to Core SQL, don’t add the COLLATE
clause.
Result <data type>s and Compatibility¶
Consider this SQL statement:
``SELECT 'A', 5.0 FROM Table_1
UNION
SELECT 'BB', 12 FROM Table_2;``
The two queries that make up the UNION
operator’s Table operands are union
compatible: each corresponding pair of Columns is comparable (that is, we can
compare 'A'
with 'BB'
and we can compare 5.0
with 12
). The
rules about the resultant merged Columns are:
The first result Column has a
CHAR(2)
<data type> (see Rules of Aggregation for theCASE
expression, in our chapter on “Simple Search Conditions”), belongs to the same Character set that'A'
does (the result always belongs to the Character set of the first input Column), and hasCOERCIBLE
coercibility.The second result Column has a
DECIMAL
orNUMERIC
<data type>, with implementor-defined precision and a scale of 2 (again, see “Rules of Aggregation” for theCASE
expression).
Now consider this SQL statement:
SELECT 'A', 5.0 FROM Table_1
UNION
SELECT 12, 'BB' FROM Table_;
This statement will fail: it’s an error to try to merge Columns which are not union compatible.
Set Operation Examples¶
Earlier in this chapter, we defined two example Tables: VILLAS
and
MANSIONS
. Let’s suppose that they have these contents:
VILLAS |
||
---|---|---|
|
|
|
Lacombe |
39.00 |
100000 |
Stettler |
15.78 |
900000 |
Roseland |
15.77 |
200000 |
Victoria |
17.90 |
|
Victoria |
17.90 |
|
MANSIONS |
|||
---|---|---|---|
|
|
|
|
Bodnar |
15.77 |
|
200000 |
Melnyk |
39.00 |
15 |
900000 |
Skoreyko |
39.00 |
0 |
900000 |
Mudriy |
|
|
|
Based on this data, here are some example of SQL statements that use set operators, and their results.
Example Statement:
-- "self-union"
SELECT * FROM Villas
UNION DISTINCT
SELECT * FROM Villas
ORDER BY acreage DESC;
|
|
|
Lacombe |
39.00 |
100000 |
Victoria |
17.90 |
|
Stettler |
15.78 |
900000 |
Roseland |
15.77 |
200000 |
Example Statement:
-- "NULL is the same as NULL"
SELECT price FROM Villas
EXCEPT ALL
SELECT price FROM Mansions;
Result Table:
|
100000 |
|
Example Statement:
-- "NULL is the same as NULL"
SELECT DISTINCT price FROM Villas
EXCEPT ALL
SELECT DISTINCT price FROM Mansions;
Result table:
|
100000 |
|
Example Statement:
SELECT * FROM Villas
INTERSECT CORRESPONDING
SELECT * FROM Mansions;
Result table:
|
|
15.77 |
200000 |
Example Statement:
SELECT acreage, 'Villa' AS type
FROM Villas
UNION
SELECT acreage, 'Mansion' AS type
FROM Mansions;
Result table:
|
|
39.00 |
|
15.78 |
|
15.77 |
|
17.90 |
|
15.77 |
|
39.00 |
|
|
|
Updatability¶
Like joined Tables, Tables merged with a set operator aren’t updatable in
SQL-92. Usually, this means that a View which is based on a query containing a
set operator can’t be the object of a DELETE
, INSERT
or UPDATE
statement.
But such Views might be updatable in SQL3. For example, if the set operator is
UNION ALL
, and the original Tables are both updatable, then so is the
result Table. This feature causes some complications, since the DBMS has to
track the original Tables. Most DBMSs support set operations by creating new
temporary Tables, therefore a data-change operation would only affect the
result Table and not the original Table.
Consider a situation where you want to change the rows that result from a
UNION
of two Tables. There are three possible situations:
You want to delete a row. In this case, if the row you want to
DELETE
was derived from the first input Table, then theDELETE
operation will remove that row from the first input Table. If the row you want toDELETE
was derived from the second input Table, then theDELETE
operation will remove that row from the second input Table.You want to update a row. Once again, if the row you want to
UPDATE
was derived from the first input Table, then theUPDATE
operation will change that row in the first input Table. If the row you want toUPDATE
was derived from the second input Table, then theUPDATE
operation will change that row in the second input Table.You want to insert a row. This isn’t allowed, because your DBMS wouldn’t know which underlying Table the new row should be put into.
Now consider a situation where you want to change the rows that result from an
EXCEPT
operation of two Tables. There are two possible situations:
You want to update a row. Since every row of the result is derived from a row of the first input Table, the
UPDATE
operation will change that row in the first input Table.You want to delete a row or insert a row. Neither of these are allowed. In the first case, if you delete a row and that row was derived from a row that is duplicated in the first input Table, the row you “deleted” would still be in the result, causing confusion. In the second case, if you insert a row that also happens to be duplicated in the second input Table, the row you “inserted” wouldn’t be part of the result, causing confusion.
Finally, consider a situation where you want to change the rows that result
from an INTERSECT
operation of two Tables. There are three possible
situations:
You want to insert a row. If the first input Table doesn’t contain a row with the same values, that row is inserted into the first input Table and if the second input Table doesn’t contain a row with the same values, that row is inserted into the second input Table. However, if the first input Table already contains a row with the same values as the row you want to insert – but the result Table doesn’t have a row derived from that row of the first input Table, then the new row is inserted into the first input Table. The same thing holds for the second input Table too: if the second input Table already contains a row with the same values as the row you want to insert – but the result Table doesn’t have a row derived from that row of the second input Table, then the new row is inserted into the second input Table.
You want to update a row. If the row you want to
UPDATE
was derived from the first input Table, then theUPDATE
operation will change that row in the first input Table. If the row you want toUPDATE
was derived from the second input Table, then theUPDATE
operation will change that row in the second input Table.You want to delete a row. This isn’t allowed, because your DBMS wouldn’t know which underlying Table the new row should be put into.
Recursive Unions¶
[Obscure Rule] applies for this entire section.
Suppose you make widgets, which are made of doohickeys and framistats, and
framistats in turn are made of a certain number of screws, nails and so on. You
get a request: list all parts – that is, list all widgets, all doohickeys, all
framistats, all screws and all nails. This is a “bill of materials” problem.
It’s not a burning issue, but it’s hard to solve with SQL-92, where the only
way you can generate a list of parts is to use several queries to scan the
PARTS
Table – whenever your DBMS finds a part that’s made up of other
parts, it has to start another query to determine its components, and so on.
SQL3 supports a set operation – the RECURSIVE UNION
– to solve this
problem: it follows a trail of rows for you. Let’s go back to the full syntax
required for a <query expression> that we showed you at the beginning of this
chapter. The parts we haven’t talked about yet are used to form a RECURSIVE
UNION
. Here’s the relevant syntax:
<query expression> ::=
[ WITH [RECURSIVE] <with list> ] <query expression body>
<with list> ::=
<with list element> [ {,<with list element> } . . . ]
<with list element> ::=
<query name> [ (<Column name list>) ] AS (<query expression>)
[ <search or cycle clause> ]
<query expression body> ::=
<non-join query expression> | <joined table>
<search or cycle clause> ::=
<search clause> | <cycle clause> | <search clause> <cycle clause>
<search clause> ::=
SEARCH <recursive search order> SET <Column name>
<recursive search order> ::=
DEPTH FIRST BY <sort specification list> |
BREADTH FIRST BY <sort specification list>
<cycle clause> ::=
CYCLE <cycle column list>
SET <Column name> TO <cycle mark value>
DEFAULT <non-cycle mark value> USING <Column name>
<cycle column list> ::=
<Column name> [ {,<Column name>} . . . ]
<cycle mark value> ::= <value expression>
<non-cycle mark value> ::= <value expression>
The optional WITH
clause for a <query expression> is made up of one or more
elements, each of which is a named query that optionally includes names for the
Columns that result from that query. (If you include the optional <Column name>
list, you must provide a name for every result Column.) The definition of a
WITH
element may also include the optional <search or cycle clause>.
A WITH
clause preceding a <query expression> provides a list of related
element queries for a search. WITH
by itself identifies a non-recursive set
of element queries – this means that an element in the list may not contain
the <query name> of an element that follows it in the list. WITH RECURSIVE
,
on the other hand, identifies a set of element queries that are potentially
recursive – so an element can contain the <query name> of a following element.
If a WITH
clause’s elements cycle at least once – that is, if element one
uses element two and element two uses element three and element three cycles
back to use element one – and if at least one element is a <non-join query
expression> formed with UNION
, then the WITH
clause is RECURSIVE
:
a potentially recursive WITH
clause’s element queries each depend on the
query which follows them in the list – that is, the first element in the list
is a query that contains the <query name> of the second element in the list
(and so depends on that query), the second element is a query that contains the
<query name> of the third element, and so on and a WITH RECURSIVE
clause’s
element queries each depend on another element in a way that causes a least one
cycle of each element query to be executed. A linearly recursive WITH
clause is a clause where each element query has at most one reference to
another query in the list.
The optional <Column name> subclause within the WITH
clause allows you to
specify names for the Columns returned by the query. (Note that the element
query may not use an expression which would result in a result Column with a
NO COLLATION
coercibility attribute.) If any two Columns returned by a
WITH
element query have the same name, or if the WITH
query is
potentially recursive, you must use the <Column name> subclause to uniquely
name each Column of the result Table.
For each element query in a WITH
clause, the element is a recursive query
if it depends on another element query. (Such queries may not include set
operators unless the second operand includes the name of the element query this
element depends on, nor may they invoke routines that affect the element query
this element depends on, nor may they include table subqueries that affect the
element query this element depends on, nor may they include OUTER JOIN
specifications that affect a <Table reference> that names the element query
this element depends on, nor may they include a <query specification> that
names the element query this element depends on.) Each element query
accumulates rows for the final result. An element’s <query name> is the name
used in an iteration of the entire <query expression> to identify the parents
of any row being accumulated into the result. If each accumulated row has only
one parent (that is, it results from a query that uses only one other query in
the WITH
list), the query is a linear RECURSIVE UNION
. If accumulated
rows have more than one parent (that is, they result from a query that uses
more than one other query in the WITH
list), the query is a nonlinear
RECURSIVE UNION
.
An expandable element query is a linearly recursive element query whose query
contains UNION
or UNION ALL
. If a WITH
clause element query is not
expandable, the element definition may not include a <search or cycle clause>.
A WITH
element query defined with a <search clause> can specify a search
order of either “SEARCH DEPTH FIRST BY
” or “SEARCH BREADTH FIRST BY
”
followed, in both cases, by a <sort key> and either ASC
or DESC
(the
syntax for the <sort specification list> is the same as the syntax we showed
you earlier for the ORDER BY clause’s <sort specification list>). SEARCH
DEPTH FIRST
means “travel down the tree structure first” and SEARCH BREADTH
FIRST
means “travel across the tree structure first”. The Column named in the
SET
subclause that follows must be a Column with an INTEGER
<data
type>; your DBMS will set that Column to a value that indicates when a row
accumulated into the result was found.
A WITH
element query defined with a <cycle clause> helps your DBMS
determine whether a row that has already been accumulated is found a second
time. This rarely happens in real life and so it probably means an error when
it happens for a query. By adding a <cycle clause> to a WITH
element
definition, you can avoid getting into infinite loops by setting a limit on how
long you want the search to go on. The <cycle column list> names the Columns
affected by the <cycle clause>. The SET
<column name> and the USING
<column name> may not identify any of the Columns returned by the query, nor
may they be the same Column. The <cycle mark value> and the <non-cycle mark
value> must both be <character string literal>s that are one character long.
They may not be the same value.
Dialects¶
Here are just some of the many restrictions that some DBMSs had placed on set operations in the past:
IBM DB2: the <data type>s of corresponding Columns must be exactly the same and be defined with exactly the same size.
IBM SQL/DS: neither of the input <query expression>s may contain
DISTINCT
.Various: neither the input <query expression>s nor the result Table may be used with set functions, with
GROUP BY
or withHAVING
.dBASE IV, SQL Server: all set operators are unsupported, period.
Those restrictions are history now. The ones which you are much more likely to encounter are the ones that are allowed for Entry-Level SQL-92:
No derived Column list is allowed, which effectively means that you can’t use <literal>s or expressions in queries that use set operators.
Set operators are illegal inside subqueries, in
INSERT ... SELECT
and in View definitions.EXCEPT
,INTERSECT
andCORRESPONDING
are illegal.
Even if your DBMS supports SQL3, it might only be Core SQL. If you want to
restrict your code to Core SQL, don’t use WITH [RECURSIVE]
, the
INTERSECT
set operator, a CORRESPONDING
clause with any set operator or
any set operator with an explicit DISTINCT
.