Chapter 10 – Collection types¶
Note
You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.
For more information specific to CrateDB, check out the CrateDB Reference.
[Obscure Rule] applies for this entire chapter.
In SQL, a <collection type> is an array: a composite constructed SQL <data type>.
Table of Contents
Collection <data type>s¶
A <collection type> is defined by a descriptor that contains three pieces of information:
The <data type>’s name:
ARRAY
.The maximum number of elements in the array.
The array’s element <data type> specification.
ARRAY¶
The required syntax for a <collection type> specification is as follows.
<collection type> ::=
<data type> <array specification>
<array specification> ::=
ARRAY[unsigned integer] | ARRAY??(unsigned integer??)
A <collection type> specification defines an array – it consists of an element
<data type> specification followed by an indication of the array’s maximum
element size. For example, both of these <collection type> specifications
define an array of up to ten numbers, all of which must fall into the range
supported by SQL’s SMALLINT
<data type>:
SMALLINT ARRAY[10]
SMALLINT ARRAY??(10??)
A <collection type> may not specify, either directly or indirectly (for
example, via a Domain), a <data type> of REF
or ARRAY
. All other SQL
<data type>s are supported.
It is important to note that the square brackets in the <collection type>’s
syntax diagrams should not be read as if they are BNF brackets – that is, they
don’t mean that ARRAY
can optionally be followed by an integer. Instead,
you use either square brackets or trigraphs to delimit the integer that
specifies the size of the array. We’ll use square brackets in the rest of our
examples.
An array is an ordered collection of elements. There are some similarities
between SQL ARRAY
s and C or Pascal arrays, but there are also some
significant differences: (a) SQL ARRAY
s are only “vectors”, so an array
of an array is not supported and (b) SQL ARRAY
s are variable size: the
specification defines an array’s maximum size, rather than its exact size. In
earlier SQL versions there was no support for ARRAY
s. In part, this lack
of support was due to a well-known principle of database design, the rule of
first normal form: “A field (a column/row intersection) may have only one
atomic value.” If rows contain arrays, they violate this principle, so a
well-designed database doesn’t define arrays all over the place.
[Obscure Rule] A <collection type> is a subtype of a <data type> if (a) both are arrays and (b) the element <data type> of the first array is a subtype of the element <data type> of the second array.
ARRAY <element reference>¶
An <element reference> returns an element of an array. The required syntax for an <element reference> is as follows.
<element reference> ::=
array_argument[numeric_argument] |
array_argument??(numeric_argument??)
An <element reference> allows you to access a specific element of an array. It
operates on two arguments: the first must evaluate to an array and the second
must evaluate to an integer. (The integer refers to the ordinal position of the
element in the array: the first element in an array is element number 1, the
second element is element number 2 and so on.) If either portion of an element
reference is NULL
, that element is also NULL
. Here are two equivalent
examples of an <element reference>:
array_column[4]
array_column??(4??)
In each case, the reference would return the fourth element of the array.
If numeric_argument
is a negative number, or if it is greater than the
number of elements in the array, the <element reference> will fail: your DBMS
will return the SQLSTATE error 2202E "data exception-array element error"
.
<array value constructor>¶
An <array value constructor> is used to construct an array. The required syntax for an <array value constructor> is as follows.
<array value constructor> ::=
ARRAY[element_expression [ {,element_expression}... ]] |
ARRAY??(element_expression [ {,element_expression}... ]??)
An <array value constructor> allows you to assign values to the elements of an
array. An element_expression
may be any expression that evaluates to a
scalar value with a <data type> that is assignable to the array’s element <data
type>. The result is an array whose n-th element value
is the value of
the n-th element_expression you specify. Here are two equivalent examples of
an <array value constructor>:
ARRAY['hello','bob','and','sally']
ARRAY??('hello','bob','and','sally'??)
These examples both construct an array with four elements. The first element
has a value of 'hello'
, the second has a value of 'bob'
, the third has
a value of 'and'
and the fourth has a value of 'sally'
. Both <array
value constructor>s would be valid for this <collection type> specification:
VARCHAR(5) ARRAY[4]
An <array value constructor> serves the same purpose for an array as a
<literal> does for a predefined <data type>. It has the same format as the
<collection type>’s ARRAY
specification – that is, ARRAY[ ]
or
ARRAY??( ??)
- - but instead of a number inside the size delimiters, it
contains comma- delimited values of the correct <data type> (these are called
array elements in SQL, though in other languages, an instance of an array is
called an “occurrence”). For example, if your <collection type> specification
is:
INT ARRAY[3]
a valid <array value constructor> would be:
ARRAY[20,10,52]
And if your <collection type> specification is:
BIT(4) ARRAY[3]
a valid <array value constructor> would be:
ARRAY[B'1011',B'0011',B'0110']
The number of elements in an <array value constructor> may vary from zero
elements (for example ARRAY[]
, an empty specification) to any number of
elements up to the array’s maximum size.
If you want to restrict your code to Core SQL, don’t define any ARRAY
<data
type>s and don’t use any of SQL’s array syntax.
Collection Operations¶
An array is compatible with, and comparable to, any array with a compatible
element <data type> – that is, arrays are mutually comparable and mutually
assignable only if their element <data type>s are mutually comparable and
mutually assignable. Arrays may not be directly compared with, or directly
assigned to, any other <data type> class, though implicit type conversions can
occur in expressions, SELECT
s, INSERT
s, DELETE
s and
UPDATE
s. Explicit array type conversions can be forced with the CAST
operator.
CAST¶
In SQL, CAST
is a scalar operator that converts a given scalar value to a
given scalar <data type>. The required syntax for the CAST
operator is:
CAST (<cast operand> AS <cast target>)
<cast operand> ::= scalar_expression
<cast target> ::= <Domain name> | <data type>
The CAST
operator converts values of a source <data type> into values of a
target <data type>, where each <data type> is an SQL pre-defined <data type>
(data conversions between UDTs are done with a user-defined cast). The source
<data type>, or <cast operand>, can be any expression that evaluates to a
single value. The target <data type>, or <cast target>, is either an SQL
predefined <data type> specification or the name of a Domain whose defined
<data type> is the SQL predefined <data type> that you want to convert the
value of “scalar_expression” into. (If you use CAST (... AS <Domain name>)
,
your current <AuthorizationID> must have the USAGE
Privilege on that
Domain.)
It isn’t, of course, possible to convert the values of every <data type> into the values of every other <data type>. For <collection type>s, the rules are:
CAST
(NULL AS
<data type>) andCAST
(array_source_is_a_null_value AS
<data type>) both result inNULL
.You can
CAST
a <collection type> source only to a <collection type> target whose element <data type> is an appropriateCAST
target for the source’s element <data type>.
When you CAST
an array to an array target, you’re actually asking your DBMS
to CAST
the value of the source element <data type> to the target element
<data type>, so the CAST
implied therein must be a legal CAST
– see
our descriptions of CAST
for each predefined SQL <data type>. Your source
may be an empty array – that is, <cast operand> may be ARRAY[]
or
ARRAY??(??)
. In this case, the result of the CAST
is an empty array
whose element <data type> is the target element <data type>. Otherwise, the
result of the CAST
is an array with the same number of elements as the
source and the target element <data type>.
Assignment¶
In SQL, when an array is assigned to an array target, the assignment is done one element at a time – that is, the source value’s first element is assigned to the target’s first element, the source’s next element is assigned to the target’s next element, and so on. Two arrays are assignable if their element <data type>s are mutually assignable.
When an array is taken from SQL-data to be assigned to an array target, if the
number of elements in the source array equals the maximum number of elements
in the target array, assignment is straightforward: the value of each element
of the source is assigned to the corresponding element of the target. If the
maximum number of elements in the target array is less than the number of
elements in the source array, then assignment of as many of the source element
values to the target elements as is possible occurs and your DBMS will return
the SQLSTATE warning 0102F "warning-array data, right truncation"
. If the
maximum number of elements in the target array is greater than the number of
elements in the source array, then assignment of each of the source element
values to the target elements occurs and the size of the target for that row
becomes the number of elements assigned. (Note that this doesn’t mean that the
maximum size of the target for other assignments lessens.)
[Obscure Rule] Since only SQL accepts null values, if your source is NULL
,
then your target’s value is not changed. Instead, your DBMS will set its
indicator parameter to -1, to indicate that an assignment of the null value
was attempted. If your target doesn’t have an indicator parameter, the
assignment will fail: your DBMS will return the SQLSTATE error 22002 "data
exception-null value, no indicator parameter"
. We’ll talk more about indicator
parameters in our chapters on SQL binding styles.
When an array is assigned to a SQL-data array target, if the number of elements
in the source array equals the maximum number of elements in the target
array, assignment is straightforward: the value of each element of the source
is assigned to the corresponding element of the target. If the maximum number
of elements in the target array is less than the number of elements in the
source array, but the extra source elements are all NULL
, then the value of
each non-null element of the source is assigned to the corresponding element of
the target. If the maximum number of elements in the target array is less
than the number of elements in the source array and the extra source elements
are not all NULL
, the assignment will fail: your DBMS will return the
SQLSTATE error 2202F "data exception-array data, right truncation"
. If the
maximum number of elements in the target array is greater than the number of
elements in the source array, then assignment of each of the source element
values to the target elements occurs and the size of the target for that row
becomes the number of elements assigned. (Note, once again, that this doesn’t
mean that the maximum size of the target for other assignments lessens.)
[Obscure Rule] There are two ways to assign a null value to an SQL-data target.
Within SQL, you can use the <keyword> NULL
in an INSERT
or an
UPDATE
statement to indicate that the target should be set to NULL
;
that is, if your source is NULL
, your DBMS will set your target to
NULL
. Outside of SQL, if your source has an indicator parameter that is set
to -1, your DBMS will set your target to NULL (regardless of the value of the
source). An indicator parameter with a value less than -1 will cause an error:
your DBMS will return the SQLSTATE error 22010 "data exception-invalid
indicator parameter value"
.
Comparison¶
SQL provides only two scalar comparison operators – = and <> – to perform
operations on arrays. Both will be familiar; there are equivalent operators in
other computer languages. Two arrays are comparable if their element <data
type>s are mutually comparable. During comparison, the elements of the
comparands are compared pairwise in element order – that is, the first element
of the first array is compared to the first element of the second array, the
second element of the first array is compared to the second element of the
second array, and so on. The two arrays are equal if (a) they both have the
same number of elements and (b) each pair of elements is equal. The arrays
are not equal if (a) they do not have the same number of elements or (b) at
least one pair of elements is not equal. If either comparand is NULL
, or if
at least one element is NULL
and all non-null element pairs are equal, the
result of the operation is UNKNOWN
. For example:
ARRAY[1] <> ARRAY[2]
returns TRUE
.
array_column = {array result is NULL}
returns UNKNOWN
.
But:
ARRAY [1,NULL] = ARRAY[1]
returns FALSE
because the number of elements is not equal.
Other Operations¶
With SQL, you have several other operations that you can perform on <collection type>s.
Scalar functions¶
SQL provides two scalar functions that return a <collection type>: the <array concatenation function> and the <cardinality expression>.
<array concatenation function>¶
The required syntax for an <array concatenation function> is as follows.
<array concatenation function> ::=
CONCATENATE(array_argument_1 { , | WITH } array_argument_2)
CONCATENATE
operates on two operands, both of which must evaluate to an
array. It joins the arrays together in the order given and returns an array
value that consists of every element of array_argument_1
followed by every
element of array_argument_2
. If either operand is NULL
, CONCATENATE
returns NULL
. For this function, you can use either a comma or the
<keyword> WITH
to separate your operands. Here are two equivalent examples
of array concatenations:
CONCATENATE(array_column,ARRAY['element_1','element_2'])
CONCATENATE(array_column WITH ARRAY['element_1','element_2'])
[Obscure Rule] There are various details about the precise characteristics of
the elements in the result array; if you’re interested, see “Rules of
Aggregation”, in our description of the CASE
expression in our chapter on
simple search conditions.
If you want to restrict your code to Core SQL, don’t use CONCATENATE
.
<cardinality expression>¶
The required syntax for a <cardinality expression> is as follows.
<cardinality expression> ::=
CARDINALITY (array_argument)
CARDINALITY
operates on an argument that evaluates to an array. It counts
the number of elements in the array and returns this as an exact numeric
integer. If the argument is NULL
, CARDINALITY
returns NULL
. Here is
an example:
CARDINALITY(ARRAY[10,20,30,40])
--returns 4
[NON-PORTABLE] The precision of CARDINALITY
’s result is non-standard
because the SQL Standard requires implementors to define the result’s
precision. [OCELOT Implementation] The OCELOT DBMS that comes with this book
gives the result of CARDINALITY
an INTEGER
<data type>.
If you want to restrict your code to Core SQL, don’t use CARDINALITY
.
Set Functions¶
SQL provides three set functions that operate on a <collection type>: COUNT
and GROUPING
. Since none of these operate exclusively with array arguments,
we won’t discuss them here; look for them in our chapter on set functions.
Predicates¶
In addition to the comparison operators, SQL provides four other predicates
that operate on arrays: the <null predicate>, the <exists predicate>, the
<quantified predicate> and the <distinct predicate>. Each will return a boolean
value: either TRUE
, FALSE
or UNKNOWN
. None of these operate
strictly on arrays, so we won’t discuss them here. Look for them in our
chapters on search conditions.
Comprehensive Example¶
Now that we’ve described SQL’s <collection type>, let’s look at some example SQL statements that put it to use.
For our example, we’ve chosen a street address: a plausible choice for an array because its subdivisions (“lines”) are not meaningful – e.g.: we don’t require that the fourth line of a street address must contain a city name. Here is an SQL statement that creates a Table with three Columns, the third of which is an array:
CREATE TABLE Mailouts (
given_name CHAR(5),
surname CHAR(5),
street_address CHAR(20) ARRAY[5]);
In this Table definition, ARRAY[5]
indicates that the street_address
Column is an array which occurs up to 5 times.
This SQL statement adds a row to the Mailouts
Table:
INSERT INTO TABLE Mailouts (
given_name,
surname,
street_address)
VALUES (
'Jean', -- given_name
'Boyer', -- surname
ARRAY['line#1','line#2','line#3']); -- street_address
In this INSERT
statement, ARRAY['line#1','line#2','line#3']
is an
<array value constructor> that specifies the values for the first three
elements of the street_address
Column array. The two possible remaining
elements are not provided with values, so they aren’t constructed for this row.
An element of an ARRAY
can be updated using an <element reference>. For
example, this SQL statement would change the rows in the Mailouts
Table by
updating the second element of every street_address
value:
UPDATE Mailouts SET
street_address[2] = 'line#2 after update';
This example uses a <character string literal> to change the value of
street_address
’s second element. (Remember that the <data type> of the
array – and therefore of each of the array’s elements – is CHAR(20)
, so
any assignment of a compatible character string value would be accepted.) The
result is that the second array element contains the string 'line#2 after
update'
and the other elements are unchanged. We could have achieved the same
result by assigning an <array value constructor> to the street_address
Column as a whole, as in this example:
UPDATE Mailouts SET
street_address = ARRAY['line#1','line #2 after update','line#3'];
Note
If you assign a 2-element <array value constructor> to a 3-element array, the result is a 2-element array – not a 3-element array with the final element unchanged. That is, if the above example was:
UPDATE Mailouts SET
street_address = ARRAY['line#1','line #2 after update'];
the result in the STREET_ADDRESS Column would be 'line#1','line #2 after
update'
rather than 'line#1','line #2 after update','line#3'
.
Tip
To avoid “array element error”, ensure all occurrences of an array are fixed
size – pad with NULL
s if necessary.
Both of the above examples updated the value of an existing array element. It’s also possible to place a value into an element that hasn’t been constructed for the row yet. For example, this SQL statement:
UPDATE Mailouts SET
street_address[5] = 'line#5';
has a two-fold effect: as the UPDATE
explicitly requires, it creates
element number five for the row, placing the string 'line#5'
into it, and
it creates element number four for the row, placing the null value into it
(since the UPDATE
statement doesn’t provide an explicit value for the
fourth element and since the fifth element can’t exist unless the first four
also exist).
Here is an example of a query on the Mailouts
Table:
SELECT given_name, surname, street_address
FROM Mailouts
WHERE street_address <> ARRAY[];
In this SELECT
statement, the street_address
Column is being referred
to as a whole in both the <select list> and the WHERE
clause. We could also
have used an <element reference> (e.g.: street_address[3]
) in either
clause, to refer to a specific element of the Column (this is true in most
situations). Our sample query searches for all rows of the Mailouts
Table
where street_address
is not an empty array; TRUE for the row we inserted
earlier.
This SQL statement uses CONCATENATE
to query the Mailouts
Table:
SELECT CONCATENATE(street_address WITH ARRAY['line#4','line#5'])
FROM Mailouts;
In this example, we’ve used CONCATENATE
to join the elements of two arrays:
one array-Column reference and one <array value constructor>. The resulting is
an array containing five elements: three from the street_address
Column and
two from the <array value constructor> (in that order). Each element of the
result is a CHAR
string.
Illegal Operations¶
SQL places a number of restrictions on where you may validly use arrays. These are worth pointing out, since most of the language is orthogonal in this respect.
You may not use arrays in a
JOIN
Column list, as a grouping Column or in a Constraint definition (forUNIQUE
orPRIMARY KEY
orFOREIGN KEY
Constraints – Constraints are illegal in most predicates and can’t containREF
s orARRAY
s.Be very careful with <element reference>s: the fact that an array is defined as
ARRAY[5]
does not guarantee that array element[4]
exists. Above all, it is a mistake to assume that an <element reference> is allowed wherever a <Column reference> is allowed – SQL’s rules aren’t there to help you break the rules of database design.