Chapter 29 – Simple Search Conditions¶
Note
You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.
For more information specific to CrateDB, check out the CrateDB Reference.
We’ve come at last to the point where we can begin showing you the most important aspect of SQL – how to query your “database”. In this chapter, we’ll begin by describing simple search conditions.
Table of Contents
Truth Values¶
In our chapter on the Boolean <data type>, we noted what SQL’s truth values
are: TRUE
, FALSE
and UNKNOWN
. But what are truth values for?
There are four possible answers to that question:
To support conditional branches and loops, as in traditional programming languages. There are optional SQL3 constructs like “
IF
(condition isTRUE
)THEN
(perform this operation)”. However, that’s advanced stuff. We’ll discuss program control operations in a later chapter.To store in the database. For that, we have the
BOOLEAN
<data type>.To support the
CASE
expression.To support relational restriction.
It is answer #4 – “to support relational restriction” – that most people
think of first. Commonly, they would use simple terms for the process – like
“finding” or “searching” – since the essence of the affair is indeed simple:
they just want to pick certain rows from a Table; say, the ones where the
NAME
Column contains Smith, or the ones where the value in the SPENDING
Column is greater than the value in the BUDGET
Column. These are such
simple questions that they seem trite, but they already illustrate some
distinguishing features of SQL searching.
First of all, they show that rows are picked by content, not by address. Although most SQL packages have some option for selecting rows according to their position or ordinal location, the makers of relational theory did not intend that to be an important part of the search process.
Secondarily, they show that Column values must meet certain conditions (such as being equal or being greater). We can easily contrive exceptions to this statement; for example some searches do not involve Column values. But certainly the idea that “conditions must be met” is universal. For successful searching, you have to know how to conceive and specify these conditions.
As we said at the beginning, in this chapter we will concern ourselves only
with the simpler conditional expressions which are necessary for searching.
Before we do that, though, we’ll briefly discuss the SQL statement that you
will use with search conditions most often: the SELECT
statement.
SELECT Statement¶
The SELECT
statement retrieves data from one or more Tables. Here is the
required syntax for a simple SELECT
statement (or, as the SQL Standard
calls it, a <query specification>):
SELECT [ DISTINCT | ALL ]
{Column expression [ AS name ]} [ ,... ] | *
FROM <Table reference> [ {,<Table reference>} ... ]
[ WHERE search condition ]
[ GROUP BY Columns [ HAVING condition ] ]
The SELECT
statement queries your SQL-data: it returns a results Table
derived from the Tables referred to in the FROM
clause. The derivation of
the result Table can be described as a sequence of operations in which the
result of each operation is input for the next. The sequence of the operations
is FROM
, then WHERE
, then GROUP BY
, then HAVING
, then the
select list (that is, the list of Column expressions) and the descriptions of
the clauses that follow appear in this order.
FROM Clause¶
The FROM
clause supplies a list of <Table references> for the query. A
<Table reference> is any expression which results in a Table but is usually
just a <Table name> or a <Correlation name> that identifies a Table that
contains SQL-data you want to query. <Column name>s throughout the SELECT
statement must be unambiguous; that is, they must be qualified with a
<Correlation name> if one was defined for the Table that owns them, and with
their <Table name> if SELECT
is retrieving data from multiple Tables which
have Columns with identical names. The required syntax for the FROM
clause
is:
FROM <Table reference> [ {,<Table reference>} ... ]
Table Reference¶
Several times throughout this book, we make the comment that the result of an SQL query is a Table. To understand the entire syntax that you may use to formulate a query then, you’ll have to start with what a Table is – and we’ve already shown you that a Table is a set of Columns and row. Up until now, though, whenever we’ve talked about Tables, we’ve referred to them only with a <Table name>. Since the result of a query is a Table derived by evaluating that query, not all SQL Tables have an explicit <Table name> – so SQL allows you to refer to any Table using a <Table reference>. The required syntax for a <Table reference> is:
<Table reference> ::=
[ ONLY ]{<Table name> | <query name>} [ [ AS ] <Correlation name> [ (<derived Column list>) ] ] |
<Table subquery> [ AS ] <Correlation name> [ (<derived Column list>) ] |
<joined Table> |
LATERAL (<query expression>) [ AS ] <Correlation name> [ (<derived Column list>) ]
<derived Column list> ::=
<Column name> [ {,<Column name> }... ]
A <Table reference> is simply a reference to some Table: this may be a reference to a named Table (that is, a <Table name> that identifies a Base table or a View) or a reference to a Table returned by a query. Thus there are five possible options for a <Table reference>:
It can refer to a Base table or a View using a <Table name>. The optional <keyword>
ONLY
in front of such a reference can only be used if the reference is to a typed Table. In that case, the <Table reference> refers to every row of the Table (or result Table), except for any rows that have a subrow in a proper subtable of that Table.It can refer to a result Table using a <query name>.
It can refer to the result of a Table subquery.
It can refer the result of a join of multiple Tables.
It can refer to a lateral Table: the result of a parenthesized <query expression> preceded by the <keyword>
LATERAL
.
In each case, you can optionally provide a <Correlation name> for the Table being referred to, as well as explicit names for each of the Columns belonging to the Table being referred to (the names specified must, of course, be unique for that reference).
If you want to restrict your code to Core SQL, don’t use a <query name> to make
up a <Table reference> and don’t use the <keyword> ONLY
to make up a <Table
reference> that refers to a typed Table.
WHERE Clause¶
The optional WHERE
clause is used to set the retrieval conditions for rows.
Any rows that don’t fall into the guidelines specified are eliminated from the
results Table. The search conditions specified may include the arithmetic and
Boolean operators, the SQL predicates (e.g.: comparison, BETWEEN
, LIKE
)
and the SQL scalar functions, as well as parentheses to set the desired
evaluation order. The required syntax for the WHERE
clause is:
WHERE <search condition>
GROUP BY Clause¶
The optional GROUP BY
clause logically rearranges the interim result
returned by the WHERE
clause into groups. The result is a set of rows where
each common datum is gathered into one group. That is, within a group, all
values of a grouping Column are the same value. For grouping, all NULL
s
are considered equal: they form one group. Because every row that contains a
group contains the same value for that group, the name of a grouping Column can
be used in a condition for the HAVING
clause or to identify a result Column
in the select list. We’ll show you the required syntax for the GROUP BY
clause in our chapter on grouping.
HAVING Clause¶
The optional HAVING
clause is used to set the retrieval conditions for
groups. Any groups that don’t fall into the guidelines specified are eliminated
from the results Table. The search conditions specified may include the
arithmetic and Boolean operators, the SQL predicates (e.g.: comparison,
BETWEEN
, LIKE
) and the SQL scalar functions, as well as parentheses to
set the desired evaluation order. HAVING
is normally applied to the interim
result returned by the GROUP BY
clause. If a SELECT
statement doesn’t
include a GROUP BY
clause, then HAVING
treats all rows of the interim
result as a single group. We’ll show you the required syntax for the HAVING
clause in our chapter on grouping.
SELECT LIST Clause¶
The select list produces a final results Table by selecting only those Columns
(or Column expressions) specified. The select list may include <Column name>s,
<Column reference>s, Column expressions (that is, any expression which
evaluates to a single Column, such as a scalar subquery) or an asterisk, as
well as one of the <keyword>s DISTINCT
or ALL
. (The asterisk is a
shorthand for a list of all the Columns of the Tables named in the FROM
clause. The DISTINCT
option ensures that duplicate rows are eliminated from
the result. The ALL
option, which is the default, ensures that duplicate
rows are included in the result.) A Column expression can be a <literal>, a
scalar function or some other expression derived from the Columns whose values
you want to retrieve but may not include any expression that will evaluate to a
Column with a BLOB
, CLOB
or NCLOB
<data type> if DISTINCT
is
specified. You can use the optional AS
name clause to specify a name for a
Column expression; it will be used to identify that result for the entire
SELECT
statement. The required syntax for a select list is:
SELECT [ ALL | DISTINCT ] Column list
Column list ::=
expression [ [ AS ] <Column name> ] [ , ... ] |
*
Let’s try some SELECT
examples on a small group of Tables. They look like
this (a question mark in a Column represents a null value):
|
|||
---|---|---|---|
|
|
|
|
A |
SMITH A |
ACCOUNTING |
1 |
B |
JONES B |
INF SYSTEMS |
2 |
C |
BROWN C |
CUST REL |
3 |
D |
BLACK D |
OPERATIONS |
4 |
E |
GREEN E |
SALES |
5 |
|
||||
---|---|---|---|---|
|
|
|
|
|
1 |
A |
KOO |
SARA |
234 WEST |
2 |
B |
MARSH |
JOHN |
456 EAST |
3 |
C |
JONES |
MABEL |
567 NORTH |
4 |
D |
MORGAN |
CHUCK |
963 SOUTH |
10 |
A |
SMITH |
ALICE |
234 WEST |
11 |
B |
JONES |
BOB |
325 RIVER |
20 |
E |
FRANCIS |
CHRIS |
861 BERLIN |
28 |
B |
TURNER |
LINDA |
114 ROBIN |
35 |
E |
OLSEN |
CAROL |
555 RIVER |
40 |
B |
WARREN |
NANCY |
? |
|
||||
---|---|---|---|---|
|
|
|
|
|
1 |
6.00 |
10TH FLOOR |
1989-10-31 |
10:15:00 |
2 |
5.00 |
16TH FLOOR |
1989-09-30 |
10:20:00 |
3 |
5.00 |
WAREHOUSE |
1989-09-30 |
10:30:00 |
4 |
8.00 |
BASEMENT |
1989-10-15 |
12:00:10 |
10 |
16.00 |
16TH FLOOR |
1989-09-30 |
12:30:00 |
11 |
16.00 |
16TH FLOOR |
1989-10-15 |
13:15:10 |
20 |
9.00 |
WAREHOUSE |
1989-10-15 |
14:00:00 |
28 |
? |
16TH FLOOR |
1989-09-15 |
14:10:00 |
35 |
9.00 |
10TH FLOOR |
1989-10-31 |
14:20:00 |
40 |
16.00 |
10TH FLOOR |
1989-10-31 |
14:35:07 |
Simple Retrieval¶
To find all departments with employees (retrieve a single Column from a Table), the following SQL statements are equivalent:
SELECT dept FROM Employee;
SELECT ALL dept FROM Employee;
SELECT Employee.dept FROM Employee;
SELECT ALL Employee.dept FROM Employee;
The first two examples use unqualified <Column name>s in the select list, while the last three use <Column reference>s (that is, <Column name>s qualified with their <Table name>s). Unless the lack of a qualifier makes a <Column name> ambiguous, the qualifier is unnecessary. The result in all cases is:
|
A |
B |
C |
D |
A |
B |
E |
B |
E |
B |
Departments are duplicated in the result because SELECT
doesn’t eliminate
them unless the DISTINCT
option is used, as in these equivalent SQL
statements:
SELECT DISTINCT dept FROM Employee;
SELECT DISTINCT Employee.dept FROM Employee;
The result in both cases is:
|
A |
B |
C |
D |
E |
To find the name of each department’s manager (retrieve multiple Columns from one Table):
SELECT dept,manager FROM Department;
The result is:
|
|
A |
SMITH A |
B |
JONES B |
C |
BROWN C |
D |
BLACK D |
E |
GREEN E |
To retrieve all Columns of one Table, these three SQL statements are equivalent:
SELECT empnum,rate,location,paid FROM Payroll;
SELECT * FROM Payroll;
SELECT Payroll.* FROM Payroll;
The result in all three cases is the entire PAYROLL
Table. (An asterisk
can be used as shorthand for “all Columns” and can be qualified just as a
<Column name> can be.)
Qualified Retrieval¶
To find all employees working in department A (retrieve one Column which fulfills one search condition):
SELECT surname FROM Employee WHERE dept='A';
The result is:
|
KOO |
SMITH |
Remember that <character string literal>s must always be enclosed in single quotes.
To find department A employees with an employee number smaller than 10 (retrieve one Column fulfilling multiple search conditions):
SELECT surname FROM Employee
WHERE dept='A' AND empnum<10;
The result is:
|
KOO |
To find the full name of the department A employee whose employee number is 10 (retrieve multiple Columns fulfilling multiple conditions from a Table):
SELECT gname,surname FROM Employee
WHERE dept='A' AND empnum=10;
The result is:
|
|
ALICE |
SMITH |
Retrieval with a <literal>¶
To include a <literal> in a result:
SELECT empnum,
'Hourly Rate=' AS hourly_rate,
rate
FROM Payroll
WHERE empnum=1 OR empnum=10;
The result is:
|
|
|
1 |
Hourly Rate= |
6.00 |
10 |
Hourly Rate= |
16.00 |
The second Column of the result is derived from the <character string literal> expression in the select list.
Retrieval with an Arithmetic Expression¶
To calculate an employee’s daily pay from the hourly rate earned (retrieve multiple Columns from a Table with an arithmetic expression):
SELECT empnum,
'Daily Rate=' AS comment,
rate*8 AS daily_rate
FROM Payroll
WHERE empnum=1 OR empnum=10;
The result is:
|
|
|
1 |
Daily Rate= |
48.00 |
10 |
Daily Rate= |
128.00 |
The third Column of the result is derived from the arithmetic expression in the select list.
Retrieval with LIKE¶
To find all employees with surnames beginning with “M” (retrieve all values matching a simple string pattern):
SELECT empnum,surname FROM Employee
WHERE surname LIKE 'M%' AND empnum<3;
The result is:
|
|
2 |
MARSH |
To find the departments whose manager’s surname has the letter “R” as the second character:
SELECT dept,manager FROM Department
WHERE surname LIKE '_R%';
The result is:
|
|
C |
BROWN C |
E |
GREEN E |
To find all employees whose given name does not include the letter “A” (retrieve values which do not match a simple string pattern):
SELECT empnum,gname FROM Employee
WHERE gname NOT LIKE '%A%';
The result is:
|
|
2 |
JOHN |
4 |
CHUCK |
11 |
BOB |
(We discussed the LIKE
predicate in our chapter on character strings.)
Retrieval with SIMILAR¶
To find all employees whose location starts with 2 digits (retrieve all values matching a complicated string pattern):
SELECT empnum,location FROM Payroll
WHERE location SIMILAR TO '[:DIGIT:][:DIGIT:]%';
The result is:
|
|
1 |
10TH FLOOR |
2 |
16TH FLOOR |
10 |
16TH FLOOR |
11 |
16TH FLOOR |
28 |
16TH FLOOR |
35 |
10TH FLOOR |
40 |
10TH FLOOR |
To find all employees whose location doesn’t start with 2 digits (retrieve all values that don’t match a complicated string pattern):
SELECT empnum,location FROM Payroll
WHERE location NOT SIMILAR TO '[:DIGIT:][:DIGIT:]%';
The result is:
|
|
3 |
WAREHOUSE |
4 |
BASEMENT |
20 |
WAREHOUSE |
(We discussed the SIMILAR
predicate in our chapter on character
strings.)
Retrieval with IS NULL¶
To find all employees with unknown addresses on file (retrieve all rows containing a null value):
SELECT empnum,surname,gname
FROM Employee
WHERE address IS NULL;
The result is:
|
|
|
40 |
WARREN |
NANCY |
To find the departments with known managers (retrieve all rows that don’t contain null values):
SELECT manager FROM Department
WHERE manager IS NOT NULL;
The result is:
|
SMITH A |
JONES B |
BROWN C |
BLACK D |
GREEN E |
(We discussed the IS NULL
predicate in our chapter on NULL
s.)
Retrieval with a Scalar Function¶
To concatenate an employee’s first initial and surname:
SELECT empnum,
SUBSTRING(gname FROM 1 FOR 1) || '. ' || surname AS fullname
FROM Employee
WHERE empnum=10;
The result is:
|
|
10 |
A. SMITH |
To concatenate the values retrieved from a Column with a <literal>:
SELECT 'HELLO ' || gname AS greeting
FROM Employee
WHERE empnum=4;
The result is:
|
HELLO CHUCK |
To find the length of a Column value and a <literal>:
SELECT surname,
CHAR_LENGTH(surname) AS surname_length,
CHAR_LENGTH('MARY') AS literal_length
FROM Employee
WHERE dept='A';
The result is:
|
|
|
KOO |
3 |
4 |
SMITH |
5 |
4 |
(The CHAR_LENGTH
function returns a character string’s length inclusive
of blanks and trailing zeros. This example assumes that SURNAME
is a
variable length Column.)
Retrieval using Date Arithmetic¶
To find the number of days since the last pay date (assume the current date is November 10, 1989):
SELECT paid,
(DATE '1989-11-10' - paid) INTERVAL DAY AS last_paid
FROM PAYROLL
WHERE empnum=1;
The result is:
|
|
1989-10-31 |
10 |
To add three months and two days to the last pay date:
SELECT empnum,
paid,
((paid + INTERVAL '3' MONTH) + INTERVAL '2' DAY) AS new_date
FROM PAYROLL
WHERE empnum=1;
The result is:
|
|
|
1 |
1989-10-31 |
1990-02-02 |
Joins¶
The ability to join a Table to others is one of the most powerful features of SQL. A join is an operation in which data is retrieved from multiple Tables. Here are some examples.
To find all information available on all employees (retrieve a join of all Columns) the following SQL statements are equivalent:
SELECT Employee.*,Payroll.*
FROM Employee,Payroll
WHERE Employee.empnum=Payroll.empnum;
SELECT *
FROM Employee,Payroll
WHERE Employee.empnum=Payroll.empnum;
The result is the entire EMPLOYEE
Table joined with the entire PAYROLL
Table over their matching employee numbers; ten rows and ten columns in all.
Note the <Column reference>s for the EMPNUM
Column, necessary to avoid
ambiguity. To eliminate duplicate Columns from the result, specific <Column
reference>s (rather than *
) must also be listed in the select list, as in
this SQL statement:
SELECT Employee.empnum,dept,surname,rate,location
FROM Employee,Payroll
WHERE Employee.empnum=1 AND Employee.empnum=Payroll.empnum;
The result is:
|
|
|
|
|
1 |
A |
KOO |
6.00 |
10TH FLOOR |
To find an employee’s manager (retrieve one Column from multiple Tables):
SELECT surname,manager
FROM Employee,Department
WHERE empnum=28 AND Employee.dept=Department.dept;
The result is:
|
|
TURNER |
JONES B |
To find the pay rates and locations of all department A employees (join values fulfilling multiple conditions from multiple Tables):
SELECT Employee.*,Payroll.*
FROM Employee,Payroll
WHERE dept='A' AND Employee.empnum=Payroll.empnum;
The result is the EMPLOYEE
Table joined with the PAYROLL
Table, for
all rows where the DEPT
column contains 'A'
.
To find the department and payroll data for employee 35:
SELECT Employee.empnum,surname,Employee.dept,manager,rate
FROM Employee,Department,Payroll
WHERE Employee.empnum=35 AND
Employee.empnum=Payroll.empnum AND
Employee.dept=Department.dept;
The result is:
|
|
|
|
|
35 |
OLSEN |
E |
GREEN E |
9.00 |
To find the manager and locations of department C’s employees:
SELECT Department.dept,manager,location
FROM Department,Payroll,Employee
WHERE Department.dept='C' AND
Department.dept=Employee.dept AND
Employee.empnum=Payroll.empnum;
The result is:
|
|
|
C |
BROWN C |
WAREHOUSE
|
Predicates¶
You’ll have noticed by now that the fundamental SQL condition is the predicate.
It states a condition whose result is either TRUE
, FALSE
or
UNKNOWN
. An example of a predicate is the familiar expression:
Example #1: x = 5
Specifically, Example#1 is an example of a “<comparison predicate>”. Note that the predicate contains a single condition. This is not a predicate:
Example #2: x = 5 AND y = 6
because AND
is a Boolean operator that combines two predicates. However,
both Example #1 and Example #2 are examples of “search conditions”, and we will
get back to Boolean operators later on.
The one thing we can say about predicates in general is that they describe
operations which take non-truth-value arguments and return truth-value results.
The truth value might be TRUE
or FALSE
; for most predicates (but not
all) the truth value might also be UNKNOWN
.
Listed below are the fourteen SQL3 predicates, of which ten are SQL-92
predicates and seven are acceptable in Core SQL, and of which all but five may
return any of TRUE
, FALSE
or UNKNOWN
(T,F,U).
|
|
|
|
|
---|---|---|---|---|
<comparison predicate> |
|
YES |
YES |
TFU |
<quantified comparison predicate> |
|
YES |
YES |
TFU |
<between predicate> |
|
YES |
YES |
TFU |
<in predicate> |
|
YES |
YES |
TFU |
<like predicate> |
|
YES |
YES |
TFU |
<null predicate> |
|
YES |
YES |
TF |
<exists predicate> |
|
YES |
YES |
TF |
<unique predicate> |
|
YES |
TF |
|
<match predicate> |
|
YES |
TF |
|
<overlaps predicate> |
|
YES |
TFU |
|
<similar predicate> |
|
TFU |
||
<quantified predicate> |
|
TFU |
||
<distinct predicate> |
|
TF |
||
<type predicate> |
|
TFU |
<comparison predicate>¶
The required syntax for a <comparison predicate> is:
<comparison predicate> ::=
expression_1 comparison operator expression_2
A <comparison predicate> compares two values and returns either TRUE
,
FALSE
or UNKNOWN
. (If either argument is NULL
, the <comparison
predicate> returns UNKNOWN
.) There are six comparison operators. Listed
below are their symbols, their official names, their converses (i.e.: what the
operator would be if the predicate was negated) and two examples of predicates
which contain the symbol and are TRUE
.
Symbol |
Name |
Examples of |
Negation |
---|---|---|---|
|
equals |
|
|
|
greater than |
|
|
|
greater than or equals |
|
|
|
less than |
|
|
|
less than or equals |
|
|
|
not equals |
|
|
There is nothing about the SQL comparison operators which would surprise the
experienced programmer, although the use of the symbol <>
for “not equals”
is worth noting. There are some old DBMSs which accept != instead, but only
<>
is acceptable in standard SQL. Most commonly, the first comparand is a
<Column name> and the second is a <literal>, e.g.:
city = 'PARIS'
-- a Column/<literal> comparison
Some general rules:
The two expression arguments must be “comparable”. Generally, this is the case if they have comparable <data type>s; for example, if both are numeric. If the comparands are rows, each corresponding pair of Fields must have comparable <data type>s. We discussed what is meant by a comparable <data type> in our chapters on each of the SQL predefined <data type>s; refer to those discussions for complete details on how comparisons work on a specific <data type>.
For
BLOB
s,CLOB
s,NCLOB
s,REF
s andARRAY
s, the only legal comparison operators are=
and<>
. There may also be some restrictions for UDTs.
It’s remarkable to think that, in primitive DBMSs, the Column/<literal> comparison was pretty well all that there was. But today we can get much fancier:
UPPER(city) = 'PARIS'
-- an expression/<literal> comparison
spending >= budget
-- a Column/Column comparison
'PARIS' = city
-- a <literal>/Column comparison (not recommended)
column_1+5=column_2+7
-- an expression/expression comparison
Finally, we could use row values instead of scalar values in our comparisons:
(column_1,column_2) = ('ADAMS',77)
-- a row-value/row-value comparison
Use row-value comparisons sparingly, because some DBMSs won’t support them.
<between predicate>¶
The required syntax for a <between predicate> is:
<between predicate> ::=
expression_1 [ NOT ] BETWEEN [ ASYMMETRIC | SYMMETRIC ]
expression_2 AND expression_3
A <between predicate> compares a value to a range of values and returns either
TRUE
, FALSE
or UNKNOWN
. (If any argument is NULL
, the <between
predicate> returns UNKNOWN
.) BETWEEN
searches for data in a specific
range. NOT BETWEEN
searches for data that do not fall into the range given.
Some general rules:
The three expressions must be comparable. If the comparands are rows, they must be of the same degree and each corresponding pair of Fields must have comparable <data type>s.
The <between predicate> can’t be used with
BLOB
s,CLOB
s,NCLOB
s,REF
s andARRAY
s.
The ASYMMETRIC
<between predicate> is TRUE
if the value of
expression_1
is greater than or equals the value of expression_2
and
the value of expression_1
is less than or equals the value of
expression_3
. For example, these <between predicate>s are all TRUE
:
1 BETWEEN 0 AND 2
'B' BETWEEN 'A' AND 'B'
CURRENT_TIME BETWEEN TIME '00:00:00.000000' AND TIME '23:59:59.999999'
NOT BETWEEN
is simply the negation of BETWEEN
so this predicate is
also TRUE
:
3 NOT BETWEEN 0 AND 2
ASYMMETRIC
is the predicate’s default condition.
The SYMMETRIC
<between predicate> is TRUE
either (a) if the value of
expression_1
is greater than or equals the value of expression_2
and
the value of expression_1
is less than or equals the value of
expression_3
or (b) if the value of expression_1
is greater than or
equals the value of expression_3
and the value of expression_1
is less
than or equals the value of expression_2
. For example, these <between
predicate>s are both TRUE
:
1 BETWEEN SYMMETRIC 2 AND 0
3 NOT BETWEEN SYMMETRIC 2 AND 0
The SYMMETRIC
option is new to SQL with SQL3.
In short, BETWEEN
is just a shorthand for a combination of >=
and
<=
comparisons, so the same rules that apply for comparisons of specific
<data type>s apply to BETWEEN
as well.
Retrieval with BETWEEN¶
Here are two <between predicate> retrieval examples from the sample Tables shown at the beginning of this chapter. First, to find the names of the manager of departments A, B and C (retrieve values that match any in a specified range):
SELECT dept,manager FROM Department
WHERE dept BETWEEN 'A' AND 'C';
The result is:
|
|
A |
SMITH A |
B |
JONES B |
C |
BROWN C |
To find the employee numbers of all employees whose pay rate is either less than 8.00 or greater than 16.00 (retrieve values not falling into a specified range):
SELECT empnum,rate FROM Payroll
WHERE rate NOT BETWEEN 8 AND 16;
The result is:
|
|
1 |
6.00 |
2 |
5.00 |
3 |
5.00 |
<distinct predicate>¶
The required syntax for a <distinct predicate> is:
<distinct predicate> ::=
expression_1 IS DISTINCT FROM expression_2
A <distinct predicate> tests two values to see whether they are distinct and
returns either TRUE
or FALSE
. The <distinct predicate> is new to SQL
with SQL3.
Some general rules:
The two expressions must be comparable. If the comparands are rows, they must be of the same degree and each corresponding pair of Fields must have comparable <data type>s.
The <distinct predicate> can’t be used with
BLOB
s,CLOB
s orNCLOB
s.
The <distinct predicate> is TRUE
if the value of expression_1
is not
equal to the value of expression_2
– that is, IS DISTINCT FROM
is the
same as the <>
<comparison predicate> in every way – except one. If
expression_1
is NULL
and expression_2
is NULL
, this predicate
returns UNKNOWN
:
expression_1 <> expression_2
(all comparisons return UNKNOWN
if either argument is NULL
). On the
other hand, this predicate will return FALSE
:
expression_1 IS DISTINCT FROM expression_2
That is, two NULL
values are not distinct from one other. And if
expression_1
is NULL
and expression_2
is non-null, this predicate
returns UNKNOWN
:
expression_1 <> expression_2
On the other hand, this predicate will return TRUE
:
expression_1 IS DISTINCT FROM expression_2
That is, a NULL
value is distinct from every non-null value. (For arrays,
IS DISTINCT FROM
tests each corresponding pair of array elements. If the
arrays are empty, or if any element of the first array contains the same value
as its corresponding element in the second array, or if both elements are
NULL
, IS DISTINCT FROM
returns FALSE
.) Except for the difference
when NULL
s are involved, IS DISTINCT FROM
is just a shorthand for the
<>
comparison, so the same rules that apply for comparisons of specific
<data type>s apply to IS DISTINCT FROM
as well.
Retrieval with IS DISTINCT FROM¶
Here is a <distinct predicate> retrieval example from the sample Tables shown at the beginning of this chapter. To find the names of employees who don’t live at 234 West (retrieve values that are distinct from a specified value):
SELECT gname,surname FROM Employee
WHERE address IS DISTINCT FROM '234 West';
The result is:
|
|
JOHN |
MARSH |
MABEL |
JONES |
CHUCK |
MORGAN |
BOB |
JONES |
CHRIS |
FRANCIS |
LINDA |
TURNER |
CAROL |
OLSEN |
NANCY |
WARREN |
Note that the final row is included in the result despite the fact that
Nancy’s Warren address is unknown; that is, there is a null value in the
ADDRESS
Column for that employee.
If you want to restrict your code to Core SQL, don’t use the <distinct predicate>.
Search Conditions¶
A search condition consists of one or more Boolean value expressions. A Boolean
value is either TRUE
or FALSE
or UNKNOWN
. In SQL-92, only a
predicate can return TRUE
, FALSE
or UNKNOWN
and therefore all
search conditions are predicates (or multiple predicates) with Boolean
operators. In SQL3, the definition is broader because there is another source
for Boolean values, namely <literal>s or Column values of <data type>
BOOLEAN
, therefore search conditions might also contain values of <data
type> BOOLEAN
.
Here are six examples of search conditions. The first two consist of a simple predicate and a simple Boolean, respectively. The other four are more complex: they contain Boolean operators.
x = 1
NOT x = 1
x = 1 AND y = 3
x = 1 OR y = 3
(boolean_column_1 OR boolean_column_2) IS FALSE
We discussed the Boolean operators in our chapter on the BOOLEAN
<data
type>. Here’s a quick recap.
The
IS [NOT] {TRUE | FALSE | UNKNOWN}
Boolean operator is rarely seen. Its effect is to change a Boolean value (which isTRUE
orFALSE
orUNKNOWN
) to eitherTRUE
orFALSE
. For example, given a search condition “(x = 5) IS UNKNOWN
”, ifx
isNULL
then the predicate “x = 5
” returnsUNKNOWN
, therefore the search condition as a whole isTRUE
.The
NOT
Boolean operator reversesTRUE
s andFALSE
s, but leavesUNKNOWN
s alone. There is a trap here: the search condition “NOT (x = 1)
” differs slightly from “(x = 1) IS FALSE
” (you can see why if you assume again that the value ofx
isNULL
). In any case, “NOT (x = 1)
” is a bad style choice because people have trouble reading Boolean operators. The better choice is “(x <> 1)
”.The
AND
Boolean operator combines two Boolean values: if both areTRUE
, the result isTRUE
. For example, this search condition is true: “5 > 0 AND 0 = 0 AND 0 <= 5
”. However, this search condition is false: “5 > 0 AND 0 = 0 AND 0 >0 5
”.The
OR
Boolean operator combines two Boolean values: if either one isTRUE
, the result isTRUE
.
Search Conditions in Clauses¶
Since an SQL3 search condition returns a Boolean value, it can be used in situations which would appear exotic/erroneous in SQL-92, for example:
SELECT (numeric_column=5)
FROM Table_1
WHERE boolean_column=(char_column LIKE 'A%');
But our immediate interest is not exotica. We want to look at some plain everyday clauses which contain search conditions as a matter of course. The interesting thing about these clauses is that they contain implicit Boolean operators. Here are the clauses and the implicit Boolean operators.
Clause |
Implicit Boolean Operator |
---|---|
|
|
|
|
|
|
|
|
|
(“satisfies”) |
|
|
Notice that the implicit Boolean looks for TRUE
in every case but one – we
talked about the CHECK
clause in our chapter on Constraints and Assertions
and noted that a condition that evaluates to UNKNOWN
also satisfies a
Constraint. Everywhere else though, although SQL has lots of rules for
generating and manipulating UNKNOWN
values, it normally throws them away in
the final step of a search. So, if you want to see results which are either
TRUE
or UNKNOWN
, you must force them through with this expression:
"... WHERE (search condition) IS NOT FALSE"
Since IS NOT FALSE
is evaluated before WHERE
, all UNKNOWN
s are
converted to TRUE
s before the WHERE
's implicit IS TRUE
Boolean
operator takes effect. (Actually, most people don’t want to see results which
are UNKNOWN
, but forcing them through is handy if you want to simulate the
effect of a CHECK
Constraint search condition.)
There is also an implicit Boolean AND
operator between clauses. For
example, this SQL statement:
SELECT Table_1.*,
Table_2.*
FROM Table_1 INNER JOIN Table_2 ON Table_1.column_1 = Table_2.column_1
WHERE Table_1.column_2 = 5;
is effectively the same as this SQL statement:
SELECT Table_1.*,
Table_2.*
FROM Table_1, Table_2
WHERE Table_1.column_1 = Table_2.column_1 AND
Table_1.column2 = 5;
However, such a transformation ignores evaluation order. We’ll get back to joins in a later chapter.
Some example searches¶
The main use of search conditions is to search. Let’s try some examples on a
small Table. This Table, which we’ll call TEAMS
, contains this data:
|
||||
---|---|---|---|---|
|
|
|
|
|
Calgary |
Stampeders |
45000 |
5 |
15000000.00 |
Edmonton |
Eskimos |
60000 |
4 |
20000000.00 |
Hamilton |
Tiger Cats |
22000 |
1 |
25000000.00 |
Montreal |
Alouettes |
18000 |
3 |
30000000.00 |
Regina |
Roughriders |
31000 |
6 |
35000000.00 |
Toronto |
Argonauts |
80000 |
2 |
40000000.00 |
Vancouver |
Lions |
? |
7 |
45000000.00 |
Winnipeg |
Blue Bombers |
31000 |
8 |
50000000.00 |
Question: What city is represented by the Blue Bombers?
SQL query:
SELECT city
FROM Teams
WHERE team_name = 'Blue Bombers';
Answer: Winnipeg.
Question: What teams, other than the Edmonton Eskimos, have stadia with a capacity of over 40000?
SQL query
SELECT team_name
FROM Teams
WHERE (city<>'Edmonton' OR team_name<>'Eskimos') AND
stadium_capacity > 40000;
Answer: Calgary, Toronto.
Question: What teams, other than the Edmonton Eskimos, might have stadia with a capacity of over 40000?
SQL query:
SELECT team_name
FROM Teams
WHERE (city<>'Edmonton' OR team_name<>'Eskimos') AND
(stadium_capacity > 40000 ) IS NOT FALSE;
Answer: Calgary, Toronto, Vancouver.
Question: Show teams whose revenue per seat is more than $1000, as well as teams which are in the top half of the standings.
SQL query:
SELECT team_name
FROM Teams
WHERE revenue/stadium_capacity > 1000 OR standing > 4;
Answer: Hamilton, Montreal, Regina, Winnipeg, Toronto, Edmonton.
Question: Show all teams.
SQL query:
SELECT *
FROM Teams
WHERE TRUE;
Answer: Calgary, Edmonton, Hamilton, Montreal, Regina, Toronto, Vancouver, Winnipeg.
This SQL statement is the same as SELECT * FROM Teams
and it shows an
interesting SQL3 development. Whereas in SQL-92 it was necessary to express
“always true” and “always false” with explicit literal expressions like 1 =
1
and 1 <> 2
, it’s now possible to use a simple <Boolean literal>.
Here’s a few more examples, this time from the sample Tables shown at the beginning of this chapter. First, to find the employees who are located in the basement or whose pay rate is between 5.00 and 6.00:
SELECT empnum,rate,location FROM Payroll
WHERE location='BASEMENT' OR rate BETWEEN 5 AND 6;
The result is:
|
|
|
1 |
6.00 |
10TH FLOOR |
2 |
5.00 |
16TH FLOOR |
3 |
5.00 |
WAREHOUSE |
4 |
8.00 |
BASEMENT |
To find the names of employees with employee numbers less than 10 who also work in Department B:
SELECT gname,surname FROM Employee
WHERE dept='B' AND empnum<10;
The result is:
|
|
JOHN |
MARSH |
SQL’s <case expression>¶
The CASE
expression is not a filtering operation, but it fits in this
context because CASE
specifies a conditional value: it takes a search
condition as input and returns a scalar value. The required syntax for the
<case expression> is:
<case expression> ::= <case abbreviation> | <case specification>
<case abbreviation> ::=
NULLIF(<value expression> ,<value expression>) |
COALESCE(<value expression> {,<value expression>}... )
<case specification> ::= <simple case> | <searched case>
<simple case> ::=
CASE
<value expression>
{WHEN <value expression> THEN <result>}...
[ ELSE <result> ]
END
<searched case> ::=
CASE
{WHEN <search condition> THEN <result>}...
[ ELSE <result> ]
END
<result> ::= <value expression> | NULL
The <case abbreviation>s NULLIF
and COALESCE
are both shorthands for
a simple CASE
expression. For example, this expression:
NULLIF(value_1,value_2)
is equivalent to this expression:
CASE WHEN value_1=value_2 THEN NULL
ELSE value_1
END
Use NULLIF
when you have some special value instead of NULL
, for
example the displayable ?
to represent a null value.
This expression:
COALESCE(value_1,value_2)
is equivalent to this expression:
CASE WHEN value_1 IS NOT NULL THEN value_1
ELSE value_2
END
And an expression containing more than two COALESCE
values is equivalent to
a series of COALESCE
expressions, which are in turn equivalent to a series
of CASE
conditions. For example, this expression:
COALESCE(value_1,value_2,value_3)
is equivalent to this expression:
CASE WHEN value_1 IS NOT NULL THEN value_1
ELSE COALESCE(value_2,value_3)
END
A simple CASE
expression operates on a <value expression>: any expression
that returns a scalar value, except for a routine that is possibly
non-deterministic or that might modify SQL-data. The <data type> of this
CASE
operand must be comparable with the <data type> of the WHEN
clause’s <value expression>(since the simple CASE
expression compares the
two to see if they are equal) and with the ELSE
clause’s <result>. (If you
omit the ELSE
clause, it defaults to ELSE NULL
.)
A searched CASE
expression also operates on a <value expression> that must
be comparable with the <data type> of the WHEN
clause’s operands and the
<data type> of the ELSE
clause’s <result>. You may specify any appropriate
search condition in a searched CASE
expression’s WHEN
clause. Once
again, the CASE
operand may be any expression that returns a scalar value,
except for a routine that is possibly non-deterministic or that might modify
SQL-data, and the default ELSE
clause is ELSE NULL
.
Here are two equivalent examples of CASE
expressions (the first example is
a searched CASE
, the second is a simple CASE
).
CASE
WHEN column_1 = 1 THEN 'one!'
WHEN column_1 = 2 THEN 'two!'
ELSE 'many'
END
CASE column_1
WHEN 1 THEN 'one!'
WHEN 2 THEN 'two!'
ELSE 'many'
END
The searched CASE
expression works as follows:
Find the first
WHEN
clause whose search condition isTRUE
. Return the value given in thatWHEN
clause’sTHEN
sub-clause.If no
WHEN
search condition isTRUE
, return the value given in theELSE
clause.All returnable values (in
THEN
clauses and in theELSE
clause) must have <data type>s that are comparable with theCASE
operand.At least one of the
THEN
values must be non-null. For example, this expression is not legal:
CASE column_1
WHEN 1 THEN NULL
ELSE NULL
END
The simple CASE
expression works the same way – simple CASE
is merely a
shorthand form of searched CASE
, where each WHEN
clause is taken to mean “WHEN
case operand = when expression”.
We prefer to use searched CASE
expressions on stylistic grounds, but many
people prefer simple CASE
expressions because they’re similar to Pascal’s
case, or C’s switch. (Notice, however, that the parallels are inexact, because
SQL’s CASE
is an expression, while Pascal’s case and C’s switch are statements.)
We will now repeat one of the questions that we asked in the previous
section, about the TEAMS
Table.
Question: What city is represented by the Blue Bombers?
SQL query:
SELECT CASE
WHEN team_name = 'Blue Bombers' THEN city
ELSE '*********'
END AS city
FROM Teams;
Answer:
The answer is the same, but with a lot of dross. It’s possible to use CASE
expressions this way as retrieval substitutes, but the more common applications
are (a) to make up for SQL’s lack of an enumerated <data type>, (b) to
perform complicated if/then calculations, (c) for translation and (d) to
avoid exceptions. We find CASE
expressions to be indispensable, and it
amazes us that in pre-SQL-92 DBMSs they didn’t exist.
Rules of Aggregation¶
The THEN
values in a CASE
expression, as well as the results of set
operations or arrays, are considered to be aggregations; that is, they have a
<data type> that is determined by evaluating the <data type>s of each value in
a set. The aggregation rules determine which <data type>s are compatible, and
what the <data type> of the result is. Stated in a very general way, the
aggregation rules are that “what’s compatible for assignment is compatible for
aggregation”, “varying trumps fixed” and “long trumps short”. More
specifically:
All numeric –
INT
,SMALLINT
,NUMERIC
,DECIMAL
,FLOAT
,REAL
,DOUBLE PRECISION
– values are compatible. If any aggregated value is approximate numeric, the result is approximate numeric; otherwise the result is exact numeric and the result’s scale is the biggest scale of any of the aggregated values. For example, this expression:
CASE ...
THEN CAST(x AS DECIMAL(9)) ...
THEN CAST (y AS DECIMAL(5,3))
ELSE NULL
END
should return a result with a <data type> of DECIMAL(9,3)
. (We say should
because some of the decisions here are implementation-defined – the only
guaranteed facts are that the scale will be 3 and the <data type> will be some
sort of exact numeric.)
All
BIT
orBIT VARYING
values are compatible. If any aggregated value is aBIT VARYING
then the result is aBIT VARYING
; otherwise the result is aBIT
. The result’s length is the length of the longest aggregated value. For example, if a Table was defined with “... column_1 BIT(5),column_2 BIT VARYING(4) ...
”, this expression:
CASE ... THEN column_1 ... ELSE column_2 END
will return a result with a <data type> of BIT VARYING(5)
.
All
BLOB
values are compatible.All
CHAR
,NCHAR
,VARCHAR
,NCHAR VARYING
,CLOB
andNCLOB
values are compatible, provided the Character set is the same. If any aggregated value is aCLOB
then the result is aCLOB
; otherwise if any value is aVARCHAR
then the result is aVARCHAR
; otherwise the result is aCHAR
. The result’s length is the length of the longest aggregated value. The result’s Collation will depend on coercibility; see the appropriate tables in our chapter on character strings. For example, this expression:
CASE ... THEN 'a' ... THEN 'abcd' ... ELSE 'abc' END
will return a result with a <data type> of CHAR(4)
, because <character
string literal>s are fixed-length character strings and the size of the largest
aggregated <literal> is 4 characters.
All
DATE
values are compatible.All
TIME
andTIME WITH TIME ZONE
values are compatible. If any aggregated value isTIME WITH TIME ZONE
, the result isTIME WITH TIME ZONE
; otherwise the result isTIME
. The fractional precision of the result is the largest fractional precision of any of the aggregated values.All
TIMESTAMP
andTIMESTAMP WITH TIME ZONE
values are compatible. If any aggregated value isTIMESTAMP WITH TIME ZONE
, the result isTIMESTAMP WITH TIME ZONE
; otherwise the result isTIMESTAMP
. The fractional precision of the result is the largest fractional precision of any of the aggregated values.All year-month
INTERVAL
values are compatible. The datetime fields of the result are from the earliest to the latest fields in any aggregated value. For example, this expression:
CASE ... THEN INTERVAL '1' YEAR ... THEN INTERVAL '1' MONTH END
will return a result with a <data type> of INTERVAL YEAR TO MONTH
.
All day-time
INTERVAL
values are compatible. The datetime fields of the result are from the earliest to the latest fields in any aggregated value.All
BOOLEAN
values are compatible.All
REF
values of the same referenced type (UDT) are compatible.All UDT values whose most specific types have some common supertype are compatible.
If you want to restrict your code to Core SQL, don’t use BLOB
s,
CLOB
s or NCLOB
s in a CASE
expression.
Dialects¶
The truth is, most vendors are just starting to get used to the SQL-92 search conditions, so don’t depend heavily on these SQL3 features:
BOOLEAN
<data type><distinct predicate>
SYMMETRIC
<between predicate>
You’ll find even stricter limitations if you don’t use a fairly powerful DBMS
– there are some that don’t support row-value comparisons, IS [NOT]
{TRUE|FALSE|UNKNOWN}
and the more difficult predicates (e.g.: MATCH
and
OVERLAPS
).