Chapter 7 – Character strings¶
Note
You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.
For more information specific to CrateDB, check out the CrateDB Reference.
In SQL, a character string is any sequence of zero or more alphanumeric characters that belong to a given Character set. A Character set is a named characters that belong to a given Character set. A Character set is a named character repertoire that includes a Form-of-use encoding scheme which defines how the repertoire’s characters are encoded as numbers. The SQL Standard defines several standard Character sets that your DBMS must support. A character string value may be a <literal>, an <identifier>, the value of a parameter or a host language variable or the result of any expression or argument that evaluates to a character string. All character strings belong to some Character set and are governed by the rules of some Collation during comparisons. A Collation is a named collating sequence. Character strings that belong to the same Character set are compatible.
A character string has a length – a non-negative integer equal to the number
of characters in the string. Characters in a character string are numbered from
left to right beginning with 1. A Character string also has a coercibility
attribute; this helps your DBMS determine which Collation to use for a
comparison that doesn’t provide an explicit COLLATE
clause. The
coercibility attribute can be either COERCIBLE
, EXPLICIT
, IMPLICIT
or NO COLLATION
. (A coercibility attribute of COERCIBLE
, EXPLICIT
or IMPLICIT
means the string has a current default Collation. A
coercibility attribute of NO COLLATION
means the string does not have a
current default Collation.) Character strings are stored in either of the six
character string <data type>s: CHARACTER
, CHARACTER VARYING
,
CHARACTER LARGE OBJECT
, NATIONAL CHARACTER
, NATIONAL CHARACTER
VARYING
or NATIONAL CHARACTER LARGE OBJECT
.
Table of Contents
Character string <literal>s¶
An SQL <character string literal> has five parts:
Its value: the sequence of characters that make up the <literal>.
Its length: the number of characters that make up the <literal>.
The name of the Character set that the <literal> belongs to.
The name of the <literal>’s default Collation. (This is the Collation that may be used to compare the <literal> with another character string in the absence of an explicit
COLLATE
clause.)The <literal>’s coercibility attribute: normally
COERCIBLE
, but can beEXPLICIT
. (The coercibility attribute helps your DBMS determine which Collation to use for a comparison that doesn’t provide an explicitCOLLATE
clause.)
A <character string literal> is either a <character string literal> or a national <character string literal>.
<character string literal>¶
The required syntax for a <character string literal> is as follows.
<character string literal> ::=
[ <Character set name> ]'string' [ COLLATE <Collation name> ]
A <character string literal> is a string of zero or more alphanumeric
characters inside a pair of single quote marks. The string’s characters must
all belong to the same Character set. Its <data type> is fixed length
CHARACTER
, though it is compatible with the CHARACTER
, CHARACTER
VARYING
, CHARACTER LARGE OBJECT
, NATIONAL CHARACTER
, NATIONAL
CHARACTER VARYING
and NATIONAL CHARACTER LARGE OBJECT
<data type>s. The
<literal>’s length is the number of characters inside the quote marks; the
delimiting single quotes aren’t part of the <literal>, so they’re not included
in the calculation of the <character string literal>’s size. Two consecutive
single quotes within a character string (i.e.: ‘’) represent one single quote
mark; together, they count as one character when calculating the size of the
<literal>. Here is an example of a <character string literal>:
'This is a <character string literal>'
[Obscure Rule] The optional Character set specification – an underline
character immediately preceding a <Character set name> (no space is allowed
between them) – names the Character set that the <literal> belongs to. Your
current <AuthorizationID> must have the USAGE
Privilege for that Character
set. For example, this <character string literal>:
LATIN1 'Hello'
belongs to the LATIN1
Character set. Note: For qualified names, the
underline character always precedes the highest level of explicit qualification
in the <Character set name>. If you omit the Character set specification, the
characters in the <literal> must belong to the Character set of the SQL-client
Module that contains the <literal>. Here are two examples of a <character
string literal>:
'This is a string in the default Character set'
LATIN1'This is a string in the LATIN1 Character set'
[Obscure Rule] A <character string literal> normally has a coercibility
attribute of COERCIBLE
and a default Collation that is the Collation
defined for its Character set – See “Character Strings and Collations” on page
159. The optional COLLATE
clause names the <literal>´s EXPLICIT
Collation for an operation. The Collation named must be a Collation defined for
the relevant Character set, but you may specify a default Collation for a
<literal> that is different from the default Collation of it´s Character set.
If you´re using COLLATE
in an SQL-Schema statement, then the
<AuthorizationID> that iwns the containing Schema must have the USAGE
Privileg on <Collation name>. If you´re using COLLATE
in any other SQL
statement, then your current <AuthorizationID> must have the USAGE
Privilege on <Collation name>. Here are four more examples of a <character
string literal>.
'This string in the default Character set will use the default Character set''s Collation'
LATIN1'This in the LATIN1 Character set will use LATIN1''s Collation'
'This string in the default Character set will use a Collation named MY.COLLATION_1' COLLATE my.collation_1
LATIN1'This string in the LATIN1 Character set will use a Collation named MYCOLLATION_1' COLLATE my.collation_1
[Obscure Rule] SQL allows you to break a long <character string literal> up into two or more smaller <character string literal>s, split by a <separator> that includes a newline character. When it sees such a <literal>, your DBMS will ignore the <separator> and treat the multiple strings as a single <literal>. For example, here are two equivalent <character string literal>s:
'This is part of a string'
' and this is the other part'
'This is part of a string and this is the other part'
(In the first example, there is a carriage return newline <separator> between
string'
and and
.)
If you want to restrict your code to Core SQL, don’t add a Character set
specification to <character string literal>s, don’t add a COLLATE
clause to
<character string literal>s and don’t split long <character string literal>s
into smaller strings.
<national character string literal>¶
The required syntax for a <national character string literal> is as follows.
<national character string literal> ::=
N'string' [ COLLATE <Collation name> ]
A <national character string literal> is a <character string literal> preceded
by the letter N; it is a synonym for a <character string literal> that belongs
to a predefined “national” Character set. Its <data type> is fixed length
NATIONAL CHARACTER
, though it is compatible with the CHARACTER
,
CHARACTER VARYING
, CHARACTER LARGE OBJECT
, NATIONAL CHARACTER
,
NATIONAL CHARACTER VARYING
and NATIONAL CHARACTER LARGE OBJECT
<data
type>s.
Other than the fact that you may not add a Character set specification to a
<national character string literal> because N'string'
implies the same
national Character set used for NCHAR
, NCHAR VARYING
and NCLOB
<data type>s, the specifications for the two types of <character string
literal>s are the same. Here are two examples of a <national character string
literal>:
N'This string in the national Character set will use the national Character set''s Collation'
N'This string in the national Character set will use a Collation named
MY.COLLATION_1' COLLATE my.collation_1
[NON-PORTABLE] The national Character set used by <national character string
literal>s and the NCHAR
, NCHAR VARYING
and NCLOB
<data type>s is
non-standard because the SQL Standard requires implementors to define a
national Character set. [OCELOT Implementation] The OCELOT DBMS that comes with
this book defines the national Character set to be ISO8BIT
. For example,
here are two equivalent <character string literal>s:
N'Hello there'
ISO8BIT'Hello there'
If you want to restrict your code to Core SQL, don’t use <national character string literal>s.
Character string <data type>s¶
A character string <data type> is defined by a descriptor that contains five pieces of information.
The <data type>’s name: either
CHARACTER
,CHARACTER VARYING
,CHARACTER LARGE OBJECT
,NATIONAL CHARACTER
,NATIONAL CHARACTER VARYING
orNATIONAL CHARACTER LARGE OBJECT
.The <data type>’s fixed length or maximum length (as applicable).
The name of the Character set that the <data type>’s set of valid values belong to. (An operation that attempts to make a character string <data type> contain a character that does not belong to its Character set will fail: your DBMS will return the
SQLSTATE error 22021 "data exception-character not in repertoire"
.)The name of the <data type>’s default Collation. (This is the Collation that may be used to compare the <data type>’s values in the absence of an explicit
COLLATE
clause.)The <data type>’s coercibility attribute – normally
IMPLICIT
, but can beEXPLICIT
. (The coercibility attribute helps your DBMS determine which Collation to use for a comparison that doesn’t provide an explicitCOLLATE
clause.)
CHARACTER¶
The required syntax for a CHARACTER
<data type> specification is as
follows.
CHARACTER <data type> ::=
CHARACTER [ (length) ]
[ CHARACTER SET <Character set name> ]
[ COLLATE <Collation name> ]
CHARACTER
may be abbreviated as CHAR
and is a fixed length alphanumeric
string, exactly “length” characters long. It defines a set of character string
values that belong to a given Character set. For example, this <character
string literal>:
'BOB'
is a valid value for this <data type> specification:
CHAR(3)
The optional length, if specified, is an unsigned, positive integer that defines the exact length, in characters, of acceptable values. The minimum length and the default length are both 1. For example, these two <data type> specifications:
CHAR
CHAR(1)
both define a set of character string values that are exactly one character long.
[NON-PORTABLE] The maximum length for CHAR
is non-standard because the SQL
Standard requires implementors to define CHAR
’s maximum length. FIPS says
that CHAR
should have a maximum length of at least 1000 characters. [OCELOT
Implementation] The OCELOT DBMS that comes with this book allows you to define
a length ranging from 1 to 4096 for CHAR
.
[Obscure Rule] The optional CHARACTER SET
clause names the Character set
that the <data type>’s values belong to. Your current <AuthorizationID> must
have the USAGE
Privilege for that Character set. For example, this <data
type> specification:
CHAR(15) CHARACTER SET LATIN1
defines a set of character string values, exactly 15 characters long, that
belong to the LATIN1
Character set. If you omit the CHARACTER SET
clause when specifying a character string <data type> in a <Column definition
or a CREATE DOMAIN
statement, the <data type>’s Character set is the
Character set named in the DEFAULT CHARACTER SET
clause of the CREATE
SCHEMA
statement that defines the Schema that the <data type> belongs to. For
example, consider this SQL statement, which creates a Schema that has a default
Character set of LATIN1
:
CREATE SCHEMA schema_example
DEFAULT CHARACTER SET INFORMATION_SCHEMA.LATIN;
Based on this definition, the <data type> specification in this SQL statement
defines a set of character string values, exactly 15 characters long, that
belong to the LATIN1
Character set:
CREATE TABLE schema_example.Table_1 (
column_1 CHAR(15));
[NON-PORTABLE] If you omit the CHARACTER SET
clause when specifying a
character string <data type> anywhere other than in a <Column definition> or a
CREATE DOMAIN
statement, the <data type>’s Character set is non-standard
because the SQL Standard requires implementors to define a default Character
set for such situations. [OCELOT Implementation] The OCELOT DBMS that comes
with this book defines ISO8BIT
as the default Character set for such
situations.
[Obscure Rule] A CHAR
<data type> has a coercibility attribute of
IMPLICIT
. The optional COLLATE
clause defines the <data type>’s default
Collation. The Collation named must be a Collation defined for the relevant
Character set, but you may define a Column, Field or Domain with a default
Collation that is different from the default Collation of its Character set. If
you’re using COLLATE
in a SQL-Schema statement, then the <AuthorizationID>
that owns the containing Schema must have the USAGE Privilege on “<Collation
name>”. If you’re using COLLATE
in any other SQL statement, then your
current <AuthorizationID> must have the USAGE
Privilege on “<Collation
name>”. For example, these two <data type> specifications:
CHAR(15) COLLATE my.collation_1
CHAR(15) CHARACTER SET my.charset_1 COLLATE my.collation_1
both define a set of character string values exactly 15 characters long. The
first example defines the <data type>’s Character set to be the default
Character set. The second example defines the <data type>’s Character set to be
a Character set named my.charset_1
. Both examples define the <data type>’s
default Collation to be a Collation named my.colllation_1
. If you omit the
COLLATE
clause, the <data type> is defined as if its Character set’s
default Collation was explicitly specified – see “Character Strings and
Collations”.
If you want to restrict your code to Core SQL, don’t use the CHARACTER SET
clause or the COLLATE
clause for CHAR
<data type> specifications.
NATIONAL CHARACTER¶
The required syntax for a NATIONAL CHARACTER
<data type> specification is
as follows.
NATIONAL CHARACTER <data type> ::=
NATIONAL CHARACTER [ (length) ] [ COLLATE <Collation name> ]
NATIONAL CHARACTER
may be abbreviated as NATIONAL CHAR
and as
NCHAR
. NCHAR
is a synonym for a CHAR
<data type> that belongs to a
predefined “national” Character set.
Other than the fact that you may not add a CHARACTER SET
clause to an
NCHAR
<data type> specification because NCHAR
implies the same national
Character set used for <national character string literal>s and NCHAR
VARYING
and NCLOB
<data type>s, the specifications for the NCHAR and CHAR
<data type>s are the same. Here are two examples of an NCHAR
<data type>
specification:
NCHAR(10)
-- uses the national Character set's Collation
NCHAR(10) COLLATE my.collation_1
-- uses a Collation named MY.COLLATION_1
[NON-PORTABLE] The national Character set used by <national character string
literal>s and the NCHAR
, NCHAR VARYING
and NCLOB
<data type>s is
non-standard because the SQL Standard requires implementors to define a
national Character set. [OCELOT Implementation] The OCELOT DBMS that comes with
this book defines the national Character set to be ISO8BIT
. For example,
these two <data type> specifications both define the same set of valid values:
NCHAR(10)
CHAR(10) CHARACTER SET ISO8BIT
If you want to restrict your code to Core SQL, don’t use the NCHAR
<data
type>.
CHARACTER VARYING¶
The required syntax for a CHARACTER VARYING
<data type> specification is as
follows.
CHARACTER VARYING <data type> ::=
CHARACTER VARYING (length)
[ CHARACTER SET <Character set name> ]
[ COLLATE <Collation name> ]
CHARACTER VARYING
may be abbreviated as CHAR VARYING
and as VARCHAR
and is a variable length alphanumeric string, from zero to “length” characters
long. It defines a set of character string values that belong to a given
Character set. For example, the follwoing three <character string literal>s:
'BOB'
'BOBBY'
'ROBERT'
are all valid values for this <data type> specification:
VARCHAR(6)
The mandatory length specification is an unsigned, positive integer that defines the maximum length, in characters, of acceptable values. The minimum length is 1.
[NON-PORTABLE] The maximum length for VARCHAR
is non-standard because the
SQL Standard requires implementors to define VARCHAR
’s maximum length. FIPS
says that VARCHAR
should have a maximum length of at least 1000 characters.
[OCELOT Implementation] The OCELOT DBMS that comes with this book allows you to
define a length ranging from 1 to 4096 for VARCHAR
.
[Obscure Rule] The optional CHARACTER SET
clause names the Character set
that the <data type>’s values belong to; see the remarks under "CHARACTER"
.
For example, this <data type> specification:
VARCHAR(15) CHARACTER SET LATIN1
defines a set of character string values, 0 to 15 characters long, that belong
to the LATIN1
Character set. (Zero length strings may be stored in a
VARCHAR
field.)
[Obscure Rule] A VARCHAR
<data type> has a coercibility attribute of
IMPLICIT
. The optional COLLATE
clause defines the <data type>’s default
Collation; see the remarks under "CHARACTER"
. For example, these two <data
type> specifications:
VARCHAR(15) COLLATE my.collation_1
VARCHAR(15) CHARACTER SET my.charset_1 COLLATE my.collation_1
both define a set of character string values, 0 to 15 characters long, that
have a default Collation called my.collation_1
.
If you want to restrict your code to Core SQL, don’t use the CHARACTER SET
clause or the COLLATE
clause for VARCHAR
<data type> specifications.
NATIONAL CHARACTER VARYING¶
The required syntax for a NATIONAL CHARACTER VARYING
<data type>
specification is:
NATIONAL CHARACTER VARYING <data type> ::=
NATIONAL CHARACTER VARYING (length) [ COLLATE <Collation name> ]
NATIONAL CHARACTER VARYING
may be abbreviated as NATIONAL CHAR VARYING
and as NCHAR VARYING
. NCHAR VARYING
is a synonym for a VARCHAR
<data type> that belongs to a predefined “national” Character set.
Other than the fact that you may not add a CHARACTER SET
clause to an
NCHAR VARYING
<data type> specification because NCHAR VARYING
implies
the same national Character set used for <national character string literal>s
and NCHAR
and NCLOB
<data type>s, the specifications for the NCHAR
VARYING
and VARCHAR
<data type>s are the same. Here are two examples of
an NCHAR VARYING
<data type> specification:
NCHAR VARYING(10)
-- uses the national Character set's Collation
NCHAR VARYING(10) COLLATE my.collation_1
-- uses a Collation named MY.COLLATION_1
[NON-PORTABLE] The national Character set used by <national character string
literal>s and the NCHAR
, NCHAR VARYING
and NCLOB
<data type>s is
non-standard because the SQL Standard requires implementors to define a
national Character set. [OCELOT Implementation] The OCELOT DBMS that comes with
this book defines the national Character set to be ISO8BIT
.For example,
these two <data type> specifications both define the same set of valid values:
NCHAR VARYING(10)
VARCHAR(10) CHARACTER SET ASCII_FULL
If you want to restrict your code to Core SQL, don’t use the NCHAR VARYING
<data type>.
CHARACTER LARGE OBJECT¶
The required syntax for a CHARACTER LARGE OBJECT
<data type> specification
is as follows.
CHARACTER LARGE OBJECT <data type> ::=
{CHARACTER LARGE OBJECT | CLOB} [ (length) ]
[ CHARACTER SET <Character set name> ]
[ COLLATE <Collation name> ]
CHARACTER LARGE OBJECT
may be abbreviated as CHAR LARGE OBJECT
and as
CLOB
and is a variable length alphanumeric string, from zero to “length”
characters long. It defines a set of large object character string values that
belong to a given Character set. For example, these three <character string
literal>s:
'BOB'
'BOBBY'
'ROBERT'
are all valid values for this <data type> specification:
CLOB(6)
The optional length, if specified, is an unsigned positive integer, possibly
followed by a letter code (either “K”, “M” or “G”). It defines the maximum length
of acceptable values in the CLOB
field.
If the length n is not followed by a letter code, the
CLOB
may hold up to n characters.The length may include the letter code “K” (kilobyte), “M” (megabyte), or “G” (gigabyte). If the length is defined as nk, the
CLOB
may hold up to n*1024 characters. If the length is defined as nG, theCLOB
may hold up to n*1,073,741,824 characters.
For example, the following <data type> specification defines a set of large Object character string values that may range from zero to 20 characters.
CLOB(20)
(Zero length large object character strings can be stored in a CLOB
field.)
This <data type> specification defines a set of large object character string values that may range from zero to 2048 characters:
CLOB(2K)
This <data type> specification defines a set of large object character string values that may range from zero to 2,097,152 characters:
CLOB(2M)
And this <data type> specification defines a set of large object character string values that may range from zero to 2,147,483,648 characters:
CLOB(2G)
[NON-PORTABLE] The default length and the maximum length for CLOB
are non-
standard because the SQL Standard requires implementors to define CLOB
’s
default and maximum lengths.
[OCELOT Implementation] The OCELOT DBMS that comes with this book allows the
length of CLOB to range from 1 to 32 and sets the default length of a CLOB
<data type> to 1K. For example, these two <data type> specifications are
equivalent: both define a set of large object character string values that may
range from zero to 1024 characters:
CLOB
CLOB(1K)
[Obscure Rule] The optional CHARACTER SET
clause names the Character set
that the <data type>’s values belong to; see the remarks under "CHARACTER"
.
For example, this <data type> specification:
CLOB(5M) CHARACTER SET LATIN1
defines a set of large object character string values, 0 to 5,242,880
characters long, that belong to the LATIN1
Character set.
[Obscure Rule] A CLOB
<data type> has a coercibility attribute of
IMPLICIT
. The optional COLLATE
clause defines the <data type>’s default
Collation; see the remarks under "CHARACTER"
. For example, these two <data
type> specifications:
CLOB(3G) COLLATE my.collation_1
CLOB(3G) CHARACTER SET my.charset_1 COLLATE my.collation_1
both define a set of large object character string values, 0 to 3,221,225,472
characters long, that have a default Collation called my.colllation_1
.
If you want to restrict your code to Core SQL, don’t use the CHARACTER SET
clause or the COLLATE
clause for CLOB
<data type> specifications.
NATIONAL CHARACTER LARGE OBJECT¶
The required syntax for a NATIONAL CHARACTER LARGE OBJECT
<data type>
specification is as follows.
NATIONAL CHARACTER LARGE OBJECT <data type> ::=
NATIONAL CHARACTER LARGE OBJECT [ (length) ][ COLLATE <Collation name> ]
NATIONAL CHARACTER LARGE OBJECT
may be abbreviated as NCHAR LARGE
OBJECT
and as NCLOB
. NCLOB
is a synonym for a CLOB
<data type>
that belongs to a predefined “national” Character set.
Other than the fact that you may not add a CHARACTER SET
clause to an
NCLOB
<data type> specification because NCLOB
implies the same national
Character set used for <national character string literal>s and NCHAR
and
NCHAR VARYING
<data type>s, the specifications for the NCLOB and CLOB <data
type>s are the same. Here are two examples of an NCLOB
<data type>
specification:
NCLOB(2K)
-- uses the national Character set's Collation
NCLOB(2K) COLLATE my.collation_1
-- uses a Collation named MY.COLLATION_1
[NON-PORTABLE] The national Character set used by <national character string
literal>s and the NCHAR
, NCHAR VARYING
and NCLOB
<data type>s is
non-standard because the SQL Standard requires implementors to define a
national Character set. [OCELOT Implementation] The OCELOT DBMS that comes with
this book defines the national Character set to be ISO8BIT
. For example,
these two <data type> specifications both define the same set of valid values:
NCLOB(1G)
CLOB(1G) CHARACTER SET ISO8BIT
If you want to restrict your code to Core SQL, don’t use the NCLOB
<data
type>.
Now that we’ve described SQL’s character string <data type>s, let’s look at some example SQL statements that put them to use.
These SQL statements make a Table with six fixed length character string Columns, insert a row, then search for any string greater than ‘hi’.
CREATE TABLE Fixed_Char_Examples (
occurrence_char_1 CHAR(2),
occurrence_char_2 CHAR(2) CHARACTER SET LATIN1,
occurrence_char_3 CHAR(2) COLLATE my.collation_1,
occurrence_char_4 CHAR(2) CHARACTER SET LATIN1 COLLATE my.collation_1,
occurrence_nchar_1 NCHAR(2),
occurrence_nchar_2 NCHAR(2) COLLATE my.collation_1);
INSERT INTO Fixed_Char_Examples (
occurrence_char_1,
occurrence_char_2,
occurrence_char_3,
occurrence_char_4,
occurrence_nchar_1,
occurrence_nchar_2)
VALUES ('mm','mm','mm','mm','mm','mm');
SELECT occurrence_char_1,
occurrence_char_2,
occurrence_char_3,
occurrence_char_4,
occurrence_nchar_1,
occurrence_nchar_2
FROM Fixed_Char_Examples
WHERE occurrence_char_4 > 'hi';
SELECT occurrence_char_1,
occurrence_char_2,
occurrence_char_3,
occurrence_char_4,
occurrence_nchar_1,
occurrence_nchar_2
FROM Fixed_Char_Examples
WHERE occurrence_nchar_2 > N'hi';
These SQL statements make a Table with six variable length character string Columns, insert a row, then search for any string not equal to ‘hi’.
CREATE TABLE Varying_Char_Examples (
occurrence_varchar_1 VARCHAR(5),
occurrence_varchar_2 VARCHAR(5) CHARACTER SET LATIN1,
occurrence_varchar_3 VARCHAR(5) COLLATE my.collation_1,
occurrence_varchar_4 VARCHAR(5) CHARACTER SET LATIN1 COLLATE my.collation_1,
occurrence_nvchar_1 NCHAR VARYING(5),
occurrence_nvchar_2 NCHAR VARYING(5) COLLATE my.collation_1);
INSERT INTO Varying_Char_Examples (
occurrence_varchar_1,
occurrence_varchar_2,
occurrence_varchar_3,
occurrence_varchar_4,
occurrence_nvchar_1,
occurrence_nvchar_2)
VALUES ('mm','mm','mm','mm','mm','mm');
SELECT occurrence_varchar_1,
occurrence_varchar_2,
occurrence_varchar_3,
occurrence_varchar_4,
occurrence_nvchar_1,
occurrence_nvchar_2
FROM Varying_Char_Examples
WHERE occurrence_varchar_4 <> 'hi';
SELECT occurrence_varchar_1,
occurrence_varchar_2,
occurrence_varchar_3,
occurrence_varchar_4,
occurrence_nvchar_1,
occurrence_nvchar_2
FROM Varying_Char_Examples
WHERE occurrence_nvchar_2 <> N'hi';
These SQL statements make a Table with six large object character string Columns, insert a row, then search for any string equal to ‘hi’.
CREATE TABLE Large_Char_Examples (
occurrence_clob_1 CLOB(10),
occurrence_clob_2 CLOB(10K) CHARACTER SET LATIN1,
occurrence_clob_3 CLOB(10M) COLLATE my.collation_1,
occurrence_clob_4 CLOB(10G) CHARACTER SET LATIN1 COLLATE my.collation_1,
occurrence_nclob_1 NCLOB(2K),
occurrence_nclob_2 NCLOB COLLATE my.collation_1);
INSERT INTO Large_Char_Examples (
occurrence_clob_1,
occurrence_clob_2,
occurrence_clob_3,
occurrence_clob_4,
occurrence_nclob_1,
occurrence_nclob_2)
VALUES ('mm','mm','mm','mm','mm','mm');
SELECT occurrence_clob_1,
occurrence_clob_2,
occurrence_clob_3,
occurrence_clob_4,
occurrence_nclob_1,
occurrence_nclob_2
FROM Large_Char_Examples
WHERE occurrence_clob_4 = 'hi';
SELECT occurrence_clob_1,
occurrence_clob_2,
occurrence_clob_3,
occurrence_clob_4,
occurrence_nclob_1,
occurrence_nclob_2
FROM Large_Char_Examples
WHERE occurrence_nclob_2 = N'hi';
Character String Operations¶
A character string is compatible with, and comparable to, all other character
strings from the same Character set – that is, character strings are mutually
comparable and mutually assignable as long as they belong to the same Character
set. Character strings may not be directly compared with, or directly assigned
to, any other <data type> class, though implicit type conversions can occur in
expressions, SELECTs
, INSERTs
, DELETEs
and UPDATEs
. Explicit
character string type conversions can be forced with the CAST operator.
CAST¶
In SQL, CAST
is a scalar operator that converts a given scalar value to a
given scalar <data type>. The required syntax for the CAST
operator is as
follows.
CAST (<cast operand> AS <cast target>)
<cast operand> ::= scalar_expression
<cast target> ::= <Domain name> | <data type>
The CAST
operator converts values of a source <data type> into values of a
target <data type>, where each <data type> is an SQL pre-defined <data type>
(data conversions between UDTs are done with a user-defined cast). The source
<data type>, or <cast operand>, can be any expression that evaluates to a
single value. The target <data type>, or <cast target>, is either an SQL
predefined <data type> specification or the name of a Domain whose defined
<data type> is the SQL predefined <data type> that you want to convert the
value of “scalar_expression” into. (If you use CAST
()… AS <Domain
name>
) your current <AuthorizationID> must have the USAGE
Privilege on
that Domain.)
It isn’t, of course, possible to convert the values of every <data type> into the values of every other <data type>. For character strings, the rules are:
CAST
(NULL AS
<data type>) andCAST
(character_string_source_is_a_null_value
AS
<data type>) both result inNULL
.You can
CAST
a fixed length or variable length character string or aCLOB
orNCLOB
source to these targets:exact numeric
approximate numeric
fixed length character string (if source and target belong to the same Character set)
variable length character string (if source and target belong to the same Character set)
CLOB
(if source and target belong to the same Character set)NCLOB
(if source and target belong to the same Character set),fixed length bit string
variable length bit string
date
time
timestamp
year-month interval
day-time interval
boolean.
You can also
CAST
a fixed length or variable length character string or aCLOB
orNCLOB
source to a UDT target or a <reference type> target if a user-defined cast exists for this purpose and your current <AuthorizationID> has theEXECUTE
Privilege on that user-defined cast.
When you CAST
a character string to an exact numeric target or an
approximate numeric target, your DBMS strips any leading or trailing spaces
from the source and converts the remaining string – which must be the
character representation of a number – to that number. For example, CAST
('-25' AS SMALLINT
) results in a SMALLINT
value of -25. If your source
string doesn’t represent a number, the CAST
will fail: your DBMS will
return the SQLSTATE error 22018 "data exception-invalid character value for
cast"
.
When you CAST
a character string to a fixed length character string target,
a variable length character string target or a CLOB
or NCLOB
target,
both source and target must belong to the same Character set, the result has
the COERCIBLE
coercibility attribute and the Collation of the result is the
default Collation of the target’s Character set.
For fixed length character string targets, if the length of the source equals the fixed length of the target, the result of the
CAST
is the source string. If the length of the source is shorter than the fixed length of the target, the result of theCAST
is the source string padded on the right with however many spaces are required to make the lengths match. If the length of the source is longer than the fixed length of the target, the result of theCAST
is a character string that contains as much of the source string as possible – in this case, if the truncated characters are not all spaces, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation"
.For variable length character string or
CLOB
orNCLOB
targets, if the length of the source is less than or equals the maximum length of the target, the result of theCAST
is the source string. If the length of the source is longer than the maximum length of the target, the result of theCAST
is a character string that contains as much of the source string as possible – in this case, if the truncated characters are not all spaces, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation"
.
When you CAST
a character string to a fixed length bit string or a variable
length bit string target, the result is the character string converted to a
string of bits. That is, when you CAST
a character string (which has a
“form- of-use encoding”) to a bit string (which has no encoding), you get the
bits that make up the characters in the source string. For example, assume the
source character string for a CAST
belongs to the ISO8BIT
Character
set. In this Character set, the code for the letter 'A'
is 41 hexadecimal
(the binary number 01000001
) and the code for the letter 'B'
is 42
hexadecimal (the binary number 01000010
) – so CAST
('AB' TO
BIT(16)
) will result in B'0100000101000010'
.
For fixed length bit string targets, if the bit length of the converted source string equals the fixed bit length of the target, the result of the
CAST
is the converted source string. If the converted source value’s bit length is larger than the fixed bit length of the target, the result of theCAST
is a bit string that contains as much of the converted source string as possible. In this case, if the truncated bits are not all zero-bits, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation"
. If the converted source value’s bit length is less than the fixed bit length of the target, the result of theCAST
is the converted bit string, padded on the least significant end with as many zero-bits as required to make the lengths match – in this case, your DBMS will return theSQLSTATE warning 01008 "warning-implicit zero-bit padding"
.For variable length bit string targets, if the bit length of the converted source string is less than or equals the maximum bit length of the target, the result of the
CAST
is the converted source string. If the converted source value’s bit length is larger than the maximum bit length of the target, the result of theCAST
is a bit string that contains as much of the converted source string as possible – in this case, if the truncated bits are not all zero-bits, your DBMS will return theSQLSTATE warning 01004 "warning-string data, right truncation"
.
When you CAST
a character string to a date target, your DBMS strips any
leading or trailing spaces from the source and converts the remaining string –
which must be the character representation of a valid date – to that date. If
your source string doesn’t represent a valid date, the CAST
will fail: your
DBMS will return the SQLSTATE error 22007 "data exception-invalid datetime
format"
.
When you CAST
a character string to a time target, your DBMS strips any
leading or trailing spaces from the source and converts the remaining string -
- which must be the character representation of a valid time – to that time.
If your source string doesn’t represent a valid time, the CAST
will fail: your
DBMS will return the SQLSTATE error 22007 "data exception-invalid datetime
format"
. If your source string isn’t a string that could represent any time
(even an invalid one), the CAST
will also fail: your DBMS will return the
SQLSTATE error 22018 "data exception-invalid character value for cast"
.
When you CAST
a character string to a timestamp target, your DBMS strips any
leading or trailing spaces from the source and converts the remaining string -
- which must be the character representation of a valid timestamp – to that
timestamp. If your source string doesn’t represent a valid timestamp, the CAST
will fail: your DBMS will return the SQLSTATE error 22007 "data
exception-invalid datetime format"
. If your source string isn’t a string that
could represent any timestamp (even an invalid one), the CAST
will also fail:
your DBMS will return the SQLSTATE error 22018 "data exception-invalid
character value for cast"
.
When you CAST
a character string to an interval target, your DBMS strips
any leading or trailing spaces from the source and converts the remaining
string - - which must be the character representation of a valid interval for
the target – to that interval. If your source string doesn’t represent a valid
interval for the target, the CAST
will fail: your DBMS will return the
SQLSTATE error 22XXX "data exception-invalid interval format"
. If your
source string isn’t a string that could represent any interval (even an invalid
one for the target), the CAST
will also fail: your DBMS will return the
SQLSTATE error 22007 "data exception-invalid datetime format"
.
When you CAST
a character string to a boolean target, your DBMS strips any
leading or trailing spaces from the source and converts the remaining string -
- which must be the character representation of one of the truth values
TRUE
, FALSE
or UNKNOWN
– to that truth value. If your source
string doesn’t represent a truth value, the CAST
will fail: your DBMS will
return the SQLSTATE error 22018 "data exception-invalid character value for
cast"
.
When you CAST
a character string to a UDT or a <reference type> target,
your DBMS invokes the user defined cast routine, with the source value as the
routine’s argument. The CAST
result is the value returned by the user
defined cast.
Here are some examples of CAST
operations with character string source
operands:
CAST (N'The rain in Spain' AS CHAR (18))
-- result is the string 'The rain in Spain'
CAST ('050.00' AS DECIMAL (3,1))
-- result is the number 50.0
CAST ('1997-04-04' AS DATE)
--result is the date DATE '1997-04-04'
CAST ('FALSE' AS BOOLEAN)
--result is the truth value FALSE
If you want to restrict your code to Core SQL, (a) don’t use <Domain name> as
a CAST
target: CAST
only to a <data type> and (b) don’t use CAST
to convert any CLOB
or NCLOB
values to another <data type>.
Assignment¶
SQL allows you to assign only compatible character strings – that is,
character strings are mutually assignable only if the source string and the
target string belong to the same Character set. If you need to assign a
character string to a target that belongs to a different Character set, use the
TRANSLATE
function to translate the source into an equivalent string that
belongs to the target’s Character set.
In SQL, when a character string is assigned to a character string target, the assignment is done one character at a time, from left to right.
When a character string is taken from SQL-data to be assigned to a fixed length
character string target and the source is shorter than the target, the source
is padded (on the right) with spaces until it matches the target’s size. If the
source is longer than the target, the source is truncated to fit the target.
In this case, your DBMS will return the SQLSTATE warning 01004
"warning-string data, right truncation"
. When a character string is taken
from SQL-data to be assigned to a variable length character string or CLOB
target, the size of the target is first set either to the size of the source or
to its own maximum length, whichever is less. The source may then be truncated,
if necessary, to match the size of the target. In this case, your DBMS will
return the SQLSTATE warning 01004 "warning-string data, right truncation"
.
[Obscure Rule] Since only SQL accepts null values, if your source is NULL
,
then your target’s value is not changed. Instead, your DBMS will set its
indicator parameter to -1, to indicate that an assignment of the null value was
attempted. If your target doesn’t have an indicator parameter, the assignment
will fail: your DBMS will return the SQLSTATE error 22002 "data
exception-null value, no indicator parameter"
. If your source is a non-null
value that fits into your target, your DBMS will set the target’s indicator
parameter (if any) to zero. If your source is longer than your target, your
DBMS will set your target’s indicator parameter to the length of the source;
that is, if your source is 12 characters long and your target can accept only
10 characters, your DBMS will set the target’s indicator parameter to 12, to
indicate that 2 characters were lost on assignment. If the source’s length is
too big to be assigned to the indicator, the assignment will fail: your DBMS
will return the SQLSTATE error 22022 "data exception-indicator overflow"
.
We’ll talk more about indicator parameters in our chapters on SQL binding
styles.
When a character string is assigned to a fixed length SQL-data character string
target and the source is shorter than the target, the source is padded (on
the right) with spaces until it matches the target’s size. If the source is
larger than the target, but the extra characters are all spaces, the source’s
significant character string value is assigned to the target. If the source is
larger than the target and the extra characters are not all spaces, the
assignment will fail: your DBMS will return the SQLSTATE error 22001 "data
exception-string data, right truncation"
. When a character string is assigned
to a variable length SQL-data character string or CLOB
target, the size of
the target is first set either to the size of the source or to its own maximum
length, whichever is less. If the source is larger than the target, but the
extra characters are all spaces, the source’s significant character string
value is assigned to the target. If the source is larger than the target and
the extra characters are not all spaces, the assignment will fail: your DBMS
will return the SQLSTATE error 22001 "data exception-string data, right
truncation"
.
[Obscure Rule] There are two ways to assign a null value to an SQL-data target.
Within SQL, you can use the <keyword> NULL
in an INSERT
or an
UPDATE
statement to indicate that the target should be set to NULL
;
that is, if your source is NULL
, your DBMS will set your target to
NULL
. Outside of SQL, if your source has an indicator parameter that is set
to -1, your DBMS will set your target to NULL
(regardless of the value of
the source). An indicator parameter with a value less than -1 will cause an
error: your DBMS will return the SQLSTATE error 22010 "data exception-invalid
indicator parameter value"
.
Comparison¶
SQL provides the usual scalar comparison operators, = and <> and < and <= and >
and >= – to perform operations on CHAR
, VARCHAR
, NCHAR
and NCHAR
VARYING
character strings but provides only the = and <> operators to perform
operations on CLOB
and NCLOB
character strings. All of them will be
familiar; there are equivalent operators in other computer languages. If any of
the comparands are NULL
, the result of the operation is UNKNOWN
. For
example:
'hello' < 'zebra'
returns TRUE
.
'hello' > {result is NULL}
returns UNKNOWN
.
SQL also provides three quantifiers – ALL
, SOME
, ANY
– which you
can use along with a comparison operator to compare a CHAR
, VARCHAR
,
NCHAR
or NCHAR VARYING
value with the collection of values returned by
a <table subquery>. (You can’t use quantifiers in CLOB
or NCLOB
comparisons.) Place the quantifier after the comparison operator, immediately
before the <table subquery>. For example:
SELECT char_column
FROM Table_1
WHERE char_column = ALL (
SELECT char_column
FROM Table_2);
ALL
returns TRUE
either (a) if the collection is an empty set (i.e.:
if it contains zero rows) or (b) if the comparison operator returns TRUE
for every value in the collection. ALL
returns FALSE
if the comparison
operator returns FALSE
for at least one value in the collection.
SOME
and ANY
are synonyms. They return TRUE
if the comparison
operator returns TRUE
for at least one value in the collection. They return
FALSE
either (a) if the collection is an empty set or (b) if the
comparison operator returns FALSE
for every value in the collection. The
search condition = ANY (collection)
is equivalent to IN (collection)
.
SQL allows you to compare character strings only if (a) they belong to the same Character set and (b) have, or can be coerced into having, the same Collation for the comparison – that is, character strings are mutually comparable only if both their Character sets and their Collations are the same. When a character string is compared to another character string, the comparison is done one character at a time, from left to right. The result of the comparison is determined by the rules of the relevant Collation. Because of this, two strings with different lengths may, or may not, compare as equal.
Although a comparison involves two character strings, both of which have a default Collation, only one of the Collations can be used to govern the result of the comparison. So, when you (a) compare character strings that have different current default Collations and (b) don’t explicitly specify a Collation for the comparison, your DBMS will use the coercibility attribute of each string to choose the relevant Collation.
A <Column name>, <Column reference> or other character string value that
includes a COLLATE
clause has a coercibility attribute of EXPLICIT
–
its Collation is the Collation named. If a <Column name> or <Column reference>
doesn’t include a COLLATE
clause, it has a coercibility attribute of
IMPLICIT
– its Collation is the Collation specified when the Column was
created (see the <data type> definitions). If any other character string value
(e.g.: a host variable or a <literal>) doesn’t include a COLLATE
clause, it
normally has a coercibility attribute of COERCIBLE
– its Collation is the
default Collation for its Character set. Sometimes a character string value is
the result of an expression that joins strings with different Collations (e.g.:
a concatenation operation that doesn’t include a COLLATE
clause). These
character strings have a coercibility attribute of NO COLLATION
.
After determining the coercibility attribute of each character string in a comparison, your DBMS will choose the relevant Collation using these rules:
Strings with
COERCIBLE
coercibility may be compared to strings with any coercibility attribute exceptNO COLLATION
. If both comparands haveCOERCIBLE
coercibility, the relevant Collation is the default Collation of their mutual Character set. If one comparand hasCOERCIBLE
coercibility and the other hasEXPLICIT
coercibility, the relevant Collation is theEXPLICIT
Collation. If one comparand hasCOERCIBLE
coercibility and the other hasIMPLICIT
coercibility, the relevant Collation is theIMPLICIT
Collation.Strings with
EXPLICIT
coercibility may be compared to strings with any coercibility attribute. If one comparand hasEXPLICIT
coercibility and the other hasCOERCIBLE
,IMPLICIT
orNO COLLATION
coercibility, the relevant Collation is theEXPLICIT
Collation. If both comparands haveEXPLICIT
coercibility, they must also have the same Collation. The relevant Collation is their mutualEXPLICIT
Collation.Strings with
IMPLICIT
coercibility may be compared to strings with any coercibility attribute exceptNO COLLATION
.If one comparand hasIMPLICIT
coercibility and the other hasCOERCIBLE
coercibility, the relevant Collation is theIMPLICIT
Collation. If one comparand hasIMPLICIT
coercibility and the other hasEXPLICIT
coercibility, the relevant Collation is theEXPLICIT
Collation. If both comparands haveIMPLICIT
coercibility, they must also have the same Collation. The relevant Collation is their mutualIMPLICIT
Collation.Strings with
NO COLLATION
coercibility may only be compared to strings with a coercibility attribute ofEXPLICIT
. The relevant Collation is theEXPLICIT
Collation.
When you compare character strings that have different lengths, the result also
depends on whether the relevant Collation has the PAD SPACE
attribute or
the NO PAD
attribute. If the relevant Collation has the PAD SPACE
attribute, your DBMS will extend the shorter character string to the length of
the larger string (by padding it on the right with spaces) before comparing the
strings. If the relevant Collation has the NO PAD
attribute, then – all
other things being equal – the longer string will evaluate as greater than
the shorter string. That is, with a NO PAD
Collation, the result of these
rules is that a shorter comparand which is equal to the same-length substring
of a larger comparand will evaluate as less than the larger comparand – even
if the remainder of the larger string consist only of spaces or controll
characters. For example, a comparison of these two <literal>s:
'BOB'
'BOB '
would result in the first <literal> being evaluated as less than the second
with a PAD SPACE
Collation and as less than the second with a NO PAD
Collation (assuming that the Collations both use the familiar Latin collating
rules.)
CREATE TABLE Table_1 (
char_column CHAR(5));
INSERTS INTO Table_1 (char_column)
VALUES ('A');
In this example, the string actually inserted is five characters long, i.e.:
'A '
Thus, with a PAD SPACE
Collation, this predicate is TRUE
:
... WHERE char_column = 'A'
and with a NO PAD
Collation, the same predicate is FALSE
.
To summarize, SQL doesn’t allow character strings to be compared unless they
belong to the same Character set and have the same Collation for the
comparison. You may explicitly specify the relevant Character set or allow it
to default to an implicit Character set chosen by your DBMS. You may also
explicitly specify the relevant Collation by adding a COLLATE
clause to
your expression; this will override the expression’s default collating
sequence. If you omit the COLLATE
clause, your DBMS will choose the
relevant Collation for you – see “Character Strings and Collations”, later in
this chapter.
If you want to restrict your code to Core SQL, don’t use CLOBs
or
NCLOBs
in comparisons.
Other Operations¶
With SQL, you have a wide range of operations that you can perform on character strings, or on other values to get a character string result.
Concatenation¶
The required syntax for a character string concatenation is as follows.
character concatenation ::=
character_string_operand_1 || character_string_operand_2
[ COLLATE <Collation name> ]
The concatenation operator operates on two operands, both of which must
evaluate to character strings belonging to the same Character set. It joins the
strings together in the order given and returns a character string with a
length equal to the sum of the lengths of its operands. If either of the
operands is NULL
, the result of the operation is also NULL
. Here are
two examples of character string concatenations:
'hello' || ' bob'
-- returns hello bob
char_column || 'hello'
-- returns CHAR_COLUMN's value followed by hello
[Obscure Rule] If both operands are fixed length character strings, the concatenation result is a fixed length character string with a length equal to the sum of the lengths of the operands – this length may not exceed the maximum allowed for a fixed length character string.
[Obscure Rule] If either operand is a variable length character string and the
sum of their lengths is not greater than the maximum allowed length for a
variable length character string, the concatenation result is a variable length
character string with a length equal to the sum of the lengths of the operands.
If the sum of the operands’ lengths is greater than the maximum allowed, but
the extra characters are all spaces, the concatenation result is a variable
length character string with a length equal to the maximum allowed length. If
the sum of the operands’ lengths is greater than the maximum allowed, and the
extra characters are not all spaces, the concatenation will fail: your DBMS
will return the SQLSTATE error 22001 "data exception-string data, right
truncation"
.
[Obscure Rule] The result of a character string concatenation normally has a
coercibility attribute and Collation determined by Table 7-2 “Collating
Sequences and Coercibility Rules for Dyadic Operations”, but you can use the
optional COLLATE
clause to force EXPLICIT
coercibility with a specific
Collation. The Collation named must be a Collation defined for the relevant
Character set. If you’re using COLLATE
in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the USAGE Privilege
on “<Collation name>”. If you’re using COLLATE
in any other SQL statement,
then your current <AuthorizationID> must have the USAGE
Privilege on
“<Collation name>”. For example:
'hello' || 'bob' COLLATE my.collation_1
specifies that the result of the concatenation should use a Collation named
my.collation_1
.
If you want to restrict your code to Core SQL, don’t use the concatenation
operator with CLOBs or NCLOBs and don’t use the COLLATE
clause to force an
EXPLICIT
Collation for any character string concatenation.
Scalar functions¶
SQL provides eleven scalar functions that return a character string: the <case expression>, the <cast specification>, the <char substring function>, the <char overlay function>, the <char trim function>, the <fold function>, the <character translation function>, the <form-of-use conversion function>, the <regular expression substring function>, the <specific type function> and the <niladic user function>. It also provides four scalar functions that operate on character strings, returning a number: the <char position expression>, the <bit length expression>, the <char length expression> and the <octet length expression>. We’ll discuss all but the <specific type function>, the <niladic user function>, the <case expression> and the <cast specification> here. Look for the rest in other chapters; for now, just remember that they all evaluate to a character string and can therefore be used anywhere in SQL that a character string could be used.
<char substring function>¶
The required syntax for a <char substring function> is as follows.
<char substring function> ::=
SUBSTRING (character_string_argument
FROM start_argument [ FOR length_argument ]
[ COLLATE <Collation name> ])
SUBSTRING
operates on three arguments: the first must evaluate to a
character string, the other two must evaluate to exact numeric integers. It
extracts a substring from character_string_argument
and returns a variable
length character string with a maximum length that equals the fixed length or
maximum variable length (as applicable) of the character string argument. If
any of the arguments are NULL
, SUBSTRING
returns NULL
.
The start_argument
is a number that marks the first character you want to
extract from character_string_argument
. If SUBSTRING
includes the
(optional) FOR
clause, “length_argument” is the total number of characters
you want to extract. If you omit the FOR
clause, SUBSTRING
will begin
at start_argument
and extract all the rest of the characters in
character_string_argument
. Here are some examples of SUBSTRING
.
SUBSTRING('epiphany' FROM 5)
-- returns hany
SUBSTRING('epiphany' FROM 5 FOR 3)
-- returns han
SUBSTRING(char_column FROM 1 FOR 4)
-- returns the first four characters of the value in CHAR_COLUMN
. . . WHERE SUBSTRING (char_column FROM 3 FOR 1) = 'A'
-- returns "true" if the third character of the value in CHAR_COLUMN is the letter A
If length_argument
is negative, your DBMS will return SQLSTATE error
22011 "data exeption-substring error."
If start_argument
is greater than
the length of character_string_argument
, or if (start_argument
+
length_argument
) is less than one, SUBSTRING
returns a zero-length
character string. If start_argument
is negative, or if (start_argument
+ length_argument
) is greater than the length of
character_string_argument
, that´s okay – the DBMS just ignores any
characters before the start of character_string_argument
or after the end
of character_string_argument
. Note that
SUBSTRING('abc' FROM -2 FOR 4)
is legal SQL syntax, but pointless because it won´t return the
“expected”result. The SQL Standart requires SUBSTRING
to return 'a'
for
this operation – not 'ab'
.
[Obscure Rule] The result of SUBSTRING
belongs to the same Character set
that its string argument does. It normally has a coercibility attribute and
Collation determined by Table 7-1 “Collating Sequences and Coercibility Rules
for Monadic Operations”, where character_string_argument
is the monadic
operator, but you can use the optional COLLATE
clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the relevant Character set. If you’re using COLLATE
in an
SQL-Schema statement, then the <AuthorizationID> that owns the containing
Schema must have the USAGE
Privilege on “<Collation name>”. If you’re using
COLLATE
in any other SQL statement, then your current <AuthorizationID>
must have the USAGE
Privilege on “<Collation name>”. For example,
SUBSTRING(char_column FROM 1 FOR 4) COLLATE my.collation_1
specifies that the result of SUBSTRING
should use a Collation named
my.collation_1
.
[Obscure Rule] SUBSTRING
can also operate on a bit string and a BLOB
.
We’ve ignored these options for now – look for them in our chapters on bit
strings and BLOBs
.
If you want to restrict your code to Core SQL, don’t use SUBSTRING
with
NCLOBs
and don’t use the COLLATE
clause to force an EXPLICIT
Collation for any SUBSTRING
operation.
<char overlay function>¶
The required syntax for a <char overlay function> is as follows.
<character overlay function> ::=
OVERLAY (character_string_argument_1
PLACING character_string_argument_2
FROM start_argument [ FOR length_argument ]
[ COLLATE <Collation name> ])
OVERLAY
operates on four arguments: the first two must evaluate to
character strings belonging to the same Character set, the other two must
evaluate to exact numeric integers. It extracts a substring from
character_argument_1
, replacing it with character_string_argument_2
,
and returns the resulting character string. If any of the arguments are
NULL
, OVERLAY
returns NULL
.
The start_argument
is a number that marks the first character you want to
replace in character_string_argument_1
. If OVERLAY
includes the
(optional) FOR
clause, length_argument
is the total number of
characters you want to replace. Thus, start_argument
and
length_argument
identify the portion of character_string_argument_1
you
want to replace, while character_string_2
is what you want to with. If you
omit the FOR
clause, then length_argument
defaults to the length of
character_string_argument_2
. Here are some examples of OVERLAY
:
OVERLAY('epiphany' PLACING 'no' FROM 5)
-- returns epipnony
OVERLAY('epiphany' PLACING 'no' FROM 5 FOR 3)
-- returns epipnoy
[Obscure Rule] The result of OVERLAY
belongs to the same Character set that
its arguments do. It normally has a coercibility attribute and Collation
determined by Table 7-2 “Collating Sequences and Coercibility Rules for Dyadic
Operations”, but you can use the optional COLLATE
clause to force
EXPLICIT
coercibility with a specific Collation. The Collation named must
be a Collation defined for the relevant Character set. If you’re using
COLLATE
in an SQL-Schema statement, then the <AuthorizationID> that owns
the containing Schema must have the USAGE
Privilege on “<Collation name>”.
If you’re using COLLATE
in any other SQL statement, then your current
<AuthorizationID> must have the USAGE
Privilege on “<Collation name>”. For
example:
OVERLAY('epiphany' PLACING 'no' FROM 5) COLLATE my.collation_1
specifies that the result of OVERLAY
should use a Collation named
my.collation_1
.
[Obscure Rule] OVERLAY
can also operate on a BLOB
. We’ve ignored this
option for now – look for it in our chapter on BLOBs.
If you want to restrict your code to Core SQL, don’t use OVERLAY
with
character strings.
<char trim function>¶
The required syntax for a <char trim function> is as follows.
<char trim function> ::=
TRIM ([ [ {LEADING | TRAILING | BOTH} ] [ character_string_argument_1 ]
FROM ] character_string_argument_2
[ COLLATE <Collation name> ])
TRIM
operates on two arguments, both of which must evaluate to character
strings that belong to the same Character set and have the same Collation for
the operation. It strips all leading, all trailing or all leading and all
trailing trim characters from character_string_argument_2
and returns the
resulting variable length character string. The result has a maximum length
that equals the fixed length or maximum variable length (as applicable) of
character_string_argument_2
. If any of the arguments are NULL
, TRIM
returns NULL
.
The trim specification is either LEADING
(i.e.: trim all leading trim
characters), TRAILING
(i.e.: trim all trailing trim characters) or BOTH
(i.e.: trim all leading and all trailing trim characters). If this clause is
omitted, TRIM
defaults to BOTH
. For example, these two TRIM functions
are equivalent: they both strip away all leading and all trailing letters A:
TRIM('A' FROM char_column)
TRIM(BOTH 'A' FROM char_column)
The character_string_argument_1
defines the trim character: the character
that should be stripped away by the TRIM
function. If
character_string_argument_1
is omitted, TRIM
strips spaces away. For
example, these two TRIM
functions are equivalent: they both strip away all
trailing spaces:
TRIM(TRAILING FROM char_column)
TRIM(TRAILING ' ' FROM char_column)
These two TRIM
functions are equivalent – they both strip away all leading
spaces:
TRIM(LEADING FROM char_column)
TRIM(LEADING ' ' FROM char_column)
These two TRIM
functions are equivalent – they both strip away all leading
and all trailing spaces:
TRIM(char_column)
TRIM(BOTH ' ' FROM char_column)
If the length of character_string_argument_1
is not one character, TRIM
will fail: your DBMS will return the SQLSTATE error 22027 "data
exception-trim error"
.
[Obscure Rule] The result of TRIM
belongs to the same Character set that
its arguments do. It normally has a coercibility attribute and Collation
determined by Table 7.1, “Collating Sequences and coercibility rules for
monadic operations,” where character_string_argument_2
is the monadic
operand, but you can use the optional COLLATE
clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the relevant Character set. If you’re using COLLATE
in an
SQL-Schema statement, then the <AuthorizationID> that owns the containing
Schema must have the USAGE
Privilege on “<Collation name>”. If you’re using
COLLATE
in any other SQL statement, then your current <AuthorizationID>
must have the USAGE
Privilege on “<Collation name>”. For example:
TRIM(BOTH ' ' FROM char_column) COLLATE my.collation_1
specifies that the result of TRIM
should use a Collation named
my.collation_1
.
[Obscure Rule] TRIM
can also operate on a BLOB
. We’ve ignored this
option for now – look for it in our chapter on BLOBs.
If you want to restrict your code to Core SQL, don’t use TRIM
with
NCLOBs
and don’t use the COLLATE
clause to force an EXPLICIT
Collation for any TRIM
operation.
<fold function>¶
The required syntax for a <fold function> is as follows.
<fold function> ::=
{ UPPER | LOWER } (character_string_argument [ COLLATE <Collation name> ])
UPPER
and LOWER
operate on an argument that evaluates to a character
string. UPPER
converts every lower case letter in
character_string_argument
to its corresponding upper case equivalent, while
LOWER
converts every upper case letter in character_string_argument
to
its corresponding lower case equivalent. Any character that has no upper or
lower case equivalent (as applicable) remains unchanged. The conversion
reflects the normal rules for letters of the simple Latin 26-letter alphabet –
that is abcdefghijklmnopqrstuvwxyz
converts to and from
ABCDEFGHIJKLMNOPQRSTUVWXYZ
– but it also reflects the normal rules for the
accented letters in character_string_argument
’s Character set, e.g.: ö
converts to and from Ö.
Both UPPER
and LOWER
return a character string with a length that
equals the fixed length or maximum variable length (as applicable) of
character_string_argument
. If the character string argument is NULL``
,
UPPER
and LOWER
return NULL
. Here are some examples:
UPPER('E. E. Cummings')
-- returns E. E. CUMMINGS
LOWER('E. E. Cummings')
-- returns e. e. cummings
UPPER(LOWER('E. E. Cummings'))
-- returns E. E. CUMMINGS
In the last example, UPPER
and LOWER
do not cancel each other out; the
output string is not the same as the input string. Such information loss occurs
because fold functions don’t affect characters which are already in the right
case.
Tip
A string which contains no letters will be the same after UPPER
and
LOWER
, so if you need to test whether a character string contains letters
do this:
SELECT character_column
FROM Table_Of_Character_Strings
WHERE UPPER(character_column) <> LOWER(character_column);
Such a query will find '1a 2'
but will not find '1$ 2'
; thus you can
use fold functions to filter out strings which contain letters.
[Obscure Rule] The result of UPPER
and LOWER
is a fixed length string
if character_string_argument
is a fixed length string and a variable length
string if character_string_argument
is a variable length string. In either
case, the result belongs to the same Character set that the argument does. It
normally has a coercibility attribute and Collation determined by Table 7-1
“Collating Sequences and Coercibility Rules for Monadic Operations”, but you
can use the optional COLLATE
clause to force EXPLICIT
coercibility with
a specific Collation. The Collation named must be a Collation defined for the
relevant Character set. If you’re using COLLATE
in an SQL-Schema statement,
then the <AuthorizationID> that owns the containing Schema must have the
USAGE
Privilege on “<Collation name>”. If you’re using COLLATE
in any
other SQL statement, then your current <AuthorizationID> must have the
USAGE
Privilege on “<Collation name>”. For example:
UPPER('hello') COLLATE my.collation_1
specifies that the result of UPPER
should use a Collation named
my.collation_1
and
LOWER('HELLO') COLLATE my.collation_1
specifies that the result of LOWER
should use a Collation named
my.collation_1
.
If you want to restrict your code to Core SQL, don’t use the COLLATE
clause
to force an EXPLICIT
Collation for any UPPER
or LOWER
operation.
<character translation function>¶
The required syntax for a <character translation function> is as follows.
<character translation function> ::=
TRANSLATE (character_string_argument USING <Translation name>
[ COLLATE <Collation name> ])
TRANSLATE
operates on an argument that evaluates to a character string. It
converts every character in character_string_argument
to its corresponding
equivalent in another Character set (by changing each character according to
some many-to-one or one-to-one mapping) and returns a variable length character
string that belongs to the target Character set defined for “<Translation
name>”. If the character string argument is NULL
, TRANSLATE
returns
NULL. Here is an example of TRANSLATE
:
TRANSLATE('hello' USING my.translation_1)
-- returns a string, equivalent to hello, that belongs to the target Character set defined for a Translation called MY.TRANSLATION_1
(Translations are defined using the CREATE TRANSLATION
statement.)
If you’re using TRANSLATE
in an SQL-Schema statement, then the
<AuthorizationID> that owns the containing Schema must have the USAGE
Privilege on “<Translation name>”. If you’re using TRANSLATE
in any other
SQL statement, then your current <AuthorizationID> must have the USAGE
Privilege on “<Translation name>”.
[Obscure Rule] The result of TRANSLATE
normally has a coercibility
attribute of IMPLICIT
and uses the default Collation of the Translation’s
target Character set, but you can use the optional COLLATE
clause to force
EXPLICIT
coercibility with a specific Collation. The Collation named must
be a Collation defined for the target Character set. If you’re using
COLLATE
in an SQL-Schema statement, then the <AuthorizationID> that owns
the containing Schema must have the USAGE
Privilege on “<Collation name>”.
If you’re using COLLATE
in any other SQL statement, then your current
<AuthorizationID> must have the USAGE
Privilege on “<Collation name>”. For
example:
TRANSLATE('hello' USING my.translation_1) COLLATE my.collation_1
specifies that the result of TRANSLATE
should use a Collation named
my.collation_1
.
If you want to restrict your code to Core SQL, don’t use TRANSLATE
.
<form-of-use conversion function>¶
The required syntax for a <form-of-use conversion function> is as follows.
<form-of-use conversion function> ::=
CONVERT (character_string_argument USING <Form-of-use conversion name>
[ COLLATE <Collation name> ])
CONVERT
operates on an argument that evaluates to a character string. It
converts every character in character_string_argument
to its corresponding
equivalent using another Form-of-use and returns the resulting variable length
character string. If the character string argument is NULL
, CONVERT
returns NULL. Here is an example of CONVERT
:
CONVERT('hello' USING INFORMATION_SCHEMA.new_form)
-- returns a string, equivalent to hello, whose characters are encoded using a Form-of-use called NEW_FORM
A Form-of-use is a character repertoire’s encoding scheme – the one-to-one
mapping scheme between each character in the repertoire and a set of internal
codes (usually 8-bit values) that define how the repertoire’s characters are
encoded as numbers. (These codes are also used to specify the order of the
characters within the repertoire.) Supported Forms-of-use are all predefined by
your DBMS and thus belong to INFORMATION_SCHEMA
. SQL provides no ability to
define your own Forms-of-use.
CONVERT
’s purpose is to allow you to transfer character strings between
SQL- data and your host application, therefore you may only use the function in
certain places. When transferring SQL-data to the host, CONVERT
is legal
only as part of a <select sublist>. For example,
SELECT CONVERT(char_column USING INFORMATION_SCHEMA.new_form)
FROM Table_1
WHERE char_column = 'hello';
When transferring host values into SQL-data, use CONVERT
to change any host
parameter. For example:
INSERT INTO Table_1 (char_column)
VALUES (CONVERT(:char_parameter USING INFORMATION_SCHEMA.new_form));
Tip
You might want to use CONVERT
to change a character string’s encoding
scheme from 8-bit to 16-bit.
[NON-PORTABLE] Whether you can use CONVERT
or not is non-standard because the
SQL Standard requires implementors to define all Forms-of-use supported – but
has no requirement that a DBMS must support any Form-of-use at all. However,
you can use TRANSLATE
to provide an equivalent operation.
[OCELOT Implementation] The OCELOT DBMS that comes with this book does
not provide support for any Form-of-use.
[Obscure Rule] The result of CONVERT
belongs to a Character set that
consists of the same character repertoire that its argument’s Character set has
– but with a different Form-of-use encoding. It normally has a coercibility
attribute of IMPLICIT
and uses the default Collation of its Character set,
but you can use the optional COLLATE
clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the target Character set. If you’re using COLLATE
in an
SQL-Schema statement, then the <AuthorizationID> that owns the containing
Schema must have the USAGE Privilege on “<Collation name>”. If you’re using
COLLATE
in any other SQL statement, then your current <AuthorizationID>
must have the USAGE
Privilege on “<Collation name>”. For example:
CONVERT('hello' USING INFORMATION_SCHEMA.new_form) COLLATE my.collation_1
specifies that the result of CONVERT
should use a Collation named
my.collation_1
.
If you want to restrict your code to Core SQL, don’t use CONVERT
.
<regular expression substring function>¶
The required syntax for a <regular expression substring function> is as follows.
<regular expression substring function> ::=
SUBSTRING (character_string_argument FROM pattern FOR escape_character
[ COLLATE <Collation name> ])
SUBSTRING
operates on three arguments, all of which evaluate to character
strings that belong to the same Character set. It extracts a substring based on
pattern
from character_string_argument
and returns a variable length
character string with a maximum length that equals the fixed length or maximum
variable length (as applicable) of character_string_argument
. Both
pattern
and escape_character
must be regular expressions (see our
discussion of the SIMILAR
predicate) and escape_character
must be
exactly one character long. If any of the arguments are NULL
, SUBSTRING
returns NULL
.
The pattern
shows the substring you want to extract from
character_string_argument
. It’s actually a triple pattern: it must consist
of three regular expressions, the middle of which is a tagged regular
expression (a regular expression that is delimited by scape_character
immediately followed by a double quote sign). For example, if your escape
character is ?
, then pattern
must contain ?
exactly two times, as
in:
'The rain?"%?"Spain'
The three parts of pattern
– start pattern
?"middle pattern?"
end
pattern
– must match character_string_argument
’s start, middle and end;
that is, the expression:
'character_string_argument' SIMILAR TO 'pattern'
must be TRUE
if the “escape <double quote>” markers are stripped from the
pattern. If that’s not the case – that is, if the start or end patterns aren’t
in the string – SUBSTRING
returns NULL
. Otherwise the result of
SUBSTRING
is character_string_argument
’s middle string which
corresponds to the middle pattern. Thus, for this SUBSTRING
function:
SUBSTRING('The rain in Spain' FROM 'The rain?"%?"Spain' FOR '?')
the result is ' in '
– that is, the return from SUBSTRING
is the
string of characters which appears between the start pattern ('The rain'
)
and the end pattern ('Spain'
).
[Obscure Rule] The result of SUBSTRING
belongs to the same Character set
that its string arguments do. It normally has a coercibility attribute and
Collation determined by Table 7-1 “Collating Sequences and Coercibility Rules
for Monadic Operations”, where character_string_argument"
is the monadic
operator, but you can use the optional COLLATE
clause to force EXPLICIT
coercibility with a specific Collation. The Collation named must be a Collation
defined for the relevant Character set. If you’re using COLLATE
in an
SQL-Schema statement, then the <AuthorizationID> that owns the containing
Schema must have the USAGE
Privilege on “<Collation name>”. If you’re using
COLLATE
in any other SQL statement, then your current <AuthorizationID>
must have the USAGE
Privilege on “<Collation name>”. For example:
SUBSTRING(char_column FROM 'hi/"[b-o]/"by' FOR '/') COLLATE my.collation_1
specifies that the result of SUBSTRING
should use a Collation named
my.collation_1
.
If you want to restrict your code to Core SQL, don’t use the <regular
expression substring function> form of SUBSTRING
.
<char position expression>¶
The required syntax for a <char position expression> is as follows.
<char position expression> ::=
POSITION (character_string_argument_1 IN character_string_argument_2)
POSITION
operates on two arguments, both of which must evaluate to character
strings that belong to the same Character set. It determines the first
character position (if any) at which character_string_argument_1
is found in
character_string_argument_2
and returns this as an exact numeric integer. If
either of the arguments are NULL
, POSITION
returns NULL
. If
character_string_argument_1
is a zero-length character string, POSITION
returns one. If character_string_argument_1
is not found in
character_string_argument_2
, POSITION
returns zero. Here are some examples
of POSITION
:
POSITION('is' IN 'mistake')
-- returns 2
POSITION('yy' IN 'mistake')
-- returns 0
POSITION('' IN 'mistake')
-- returns 1
[NON-PORTABLE] The precision of POSITION
’s result is non-standard because
the SQL Standard requires implementors to define the result’s precision.
[OCELOT Implementation] The OCELOT DBMS that comes with this book gives the
result of POSITION
an INTEGER
<data type>.
[Obscure Rule] POSITION
can also operate on a bit string and a BLOB
.
We’ve ignored these options for now – look for them in our chapters on bit
strings and BLOBs.
If you want to restrict your code to Core SQL, don’t use POSITION with character strings.
<bit length expression>¶
The required syntax for a <bit length expression> is as follows.
<bit length expression> ::=
BIT_LENGTH (character_string_argument)
BIT_LENGTH
operates on an argument that evaluates to a character string. It
determines the length of the argument, in bits, and returns this as an exact
numeric integer, e.g., BIT_LENGTH('hello')
returns 40 (assuming that an 8-bit
Character set is in use). If the argument is NULL
, BIT_LENGTH
returns NULL
.
Tip
The length of a character string argument depends on the Character set it
belongs to. Most Character sets are 8-bit sets, so BIT_LENGTH
would
return 8 for each character in your argument. But if you’re using a DBCS,
remember that BIT_LENGTH
will allot 16 bits for each character.
Tip
BIT_LENGTH
will return the total length of your character string argument
– including any trailing (or leading) spaces. If you’re looking for the
length of the significant value only, use TRIM
with BIT_LENGTH
. For
example:
BIT_LENGTH('hello ')
-- returns 64; the length of hello followed by 3 spaces
BIT_LENGTH(TRIM('hello '))
-- returns 40; the length of hello
[NON-PORTABLE] The precision of BIT_LENGTH
’s result is non-standard because
the SQL Standard requires implementors to define the result’s precision.
[OCELOT Implementation] The OCELOT DBMS that comes with this book gives the
result of BIT_LENGTH
an INTEGER
<data type>.
[Obscure Rule] BIT_LENGTH
can also operate on a bit string and a BLOB.
We’ve ignored these options for now – look for them in our chapters on bit
strings and BLOBs.
If you want to restrict your code to Core SQL, don’t use BIT_LENGTH
with
NCLOBs
.
<char length expression>¶
The required syntax for a <char length expression> is as follows.
<char length expression> ::=
{CHAR_LENGTH | CHARACTER_LENGTH} (character_string_argument)
CHAR_LENGTH
(or CHARACTER_LENGTH
) operates on an argument that
evaluates to a character string. It determines the length of the argument, in
characters, and returns this as an exact numeric integer, e.g.,
CHAR_LENGTH('hello')
returns 5. If the argument is NULL
,
CHAR_LENGTH
returns NULL
.
Tip
CHAR_LENGTH
will return the total length of your character string argument –
including any trailing (or leading) spaces. If you’re looking for the length
of the significant value only, use TRIM
with CHAR_LENGTH
. For example:
CHAR_LENGTH('hello ')
-- returns 8; the length of hello followed by 3 spaces
CHAR_LENGTH(TRIM('hello '))
-- returns 5; the length of hello
Tip
CHAR_LENGTH
returns the number of Latin letters in your character string
argument. For example, if your argument is 'Chorizo'
and you’re using a
Spanish Collation, the second character is ‘h’ – it is not 'o'
despite the
digraph, because CHAR_LENGTH
doesn’t care about the Collation. Thus:
CHAR_LENGTH('Chorizo') COLLATE my.spanish_collation
returns 7, not 6.
[NON-PORTABLE] The precision of CHAR_LENGTH
’s result is non-standard
because the SQL Standard requires implementors to define the result’s
precision. [OCELOT Implementation] The OCELOT DBMS that comes with this book
gives the result of CHAR_LENGTH
an INTEGER
<data type>.
[Obscure Rule] CHAR_LENGTH
can also operate on a bit string and a BLOB
.
We’ve ignored these options for now – look for them in our chapters on bit
strings and BLOBs.
If you want to restrict your code to Core SQL, don’t use CHAR_LENGTH
with
NCLOBs
.
<octet length expression>¶
The required syntax for a <octet length expression> is as follows.
<octet length expression> ::=
OCTET_LENGTH (character_string_argument)
OCTET_LENGTH
operates on an argument that evaluates to a character string.
It determines the length of the argument, in octets, and returns this as an
exact numeric integer, e.g., OCTET_LENGTH('hello')
returns 5 (assuming
that an 8-bit Character set is in use; the octet length of a string is the bit
length divided by 8, ignoring any remainder.) If the argument is NULL
,
OCTET_LENGTH
returns NULL
.
Tip
The length of a character string argument depends on the Character set it
belongs to. Most Character sets are 8-bit sets, so OCTET_LENGTH
would return
1 for each character in your argument. But if you’re using a DBCS, remember
that OCTET_LENGTH
will allot 2 octets for each character.
Tip
OCTET_LENGTH
will return the total length of your character string
argument – including any trailing (or leading) spaces. If you’re looking for
the length of the significant value only, use TRIM
with OCTET_LENGTH
.
For example:
OCTET_LENGTH('hello ')
-- returns 8; the length of hello followed by 3 spaces
OCTET_LENGTH(TRIM('hello '))
-- returns 5; the length of hello
[NON-PORTABLE] The precision of OCTET_LENGTH
’s result is non-standard
because the SQL Standard requires implementors to define the result’s
precision. [OCELOT Implementation] The OCELOT DBMS that comes with this book
gives the result of OCTET_LENGTH
an INTEGER
<data type>.
[Obscure Rule] OCTET_LENGTH
can also operate on a bit string and a
BLOB
. We’ve ignored these options for now – look for them in our chapters
on bit strings and BLOBs.
If you want to restrict your code to Core SQL, don’t use OCTET_LENGTH
with
NCLOBs
.
Set functions¶
SQL provides five set functions that operate on CHAR
, VARCHAR
,
NCHAR
and NCHAR VARYING
character strings: COUNT
, MAX
, MIN
and GROUPING
. SQL also provides three set functions that operate on
CLOB
and NCLOB
character strings: COUNT
and GROUPING
. Since
none of these operate exclusively with character string arguments, we won’t
discuss them here; look for them in our chapter on set functions.
Predicates¶
In addition to the comparison operators, SQL provides ten other predicates that
operate on CHAR
, VARCHAR
, NCHAR
and NCHAR VARYING
character
strings: the <like predicate>, the <similar predicate>, the <between
predicate>, the <in predicate>, the <null predicate>, the <exists predicate>,
the <unique predicate>, the <match predicate>, the <quantified predicate> and
the <distinct predicate>. SQL also provides five predicates that operate on
CLOB
and NCLOB
character strings: the <like predicate>, the <similar
predicate>, the <null predicate>, the <exists predicate> and the <quantified
predicate>. Each will return a boolean value: either TRUE
, FALSE
or
UNKNOWN
. Only the <like predicate> and the <similar predicate> operate
strictly on strings; we’ll discuss them here. Look for the rest in our chapter
on search conditions.
<like predicate>¶
The required syntax for a <like predicate> is as follows.
<like predicate> ::=
character_string_argument [ NOT ] LIKE pattern [ ESCAPE escape_character ]
LIKE
is a predicate that operates on three operands that evaluate to
character strings belonging to the same Character set: it searches for values
that contain a given pattern. NOT LIKE
is the converse and lets you search
for values that don’t contain a given pattern. The
character_string_argument
is the character string you’re searching within,
the “pattern” is the pattern you’re searching for and the optional
escape_character
is a character that tells your DBMS to treat a
metacharacter in the pattern as itself (rather than as a metacharacter). If
character_string_argument
contains the pattern, LIKE
returns TRUE
and NOT LIKE
returns FALSE
. If character_string_argument
does not
contain the pattern, LIKE
returns FALSE
and NOT LIKE
returns
TRUE
. If any of the operands are NULL
, LIKE
and NOT LIKE
return
UNKNOWN
.
The pattern you specify in pattern
may contain any combination of regular
characters and metacharacters. Any single character in pattern
that is not
a metacharacter or the escape_character
represents itself in the pattern.
For example, this predicate:
char_column LIKE 'A'
is TRUE
for 'A'
.
Special significance is attached to metacharacters in a pattern. The
metacharacters are: _ and %. If the predicate doesn’t include an ESCAPE
clause, they are interpreted as follows:
_ An underline character means “any single character”. For example, the predicate:
char_column LIKE 'A_C'
is TRUE
for 'A C'
, 'AAC'
, 'ABC'
, 'AxxxxxxxxC'
and so on.
% A percent sign means “any string of zero or more characters”. For example, this predicate:
char_column LIKE 'A%C'
is TRUE
for 'AC'
, 'A C'
, 'AxC'
, 'AxxxxxxxxC'
and so on.
If you want to search for a character that would normally be interpreted as a
metacharacter, you must use the optional ESCAPE
clause. To do so,
Pick a character that you won’t need in the pattern and designate it as your escape character.
In the pattern, use your escape character followed immediately by the metacharacter, to designate the metacharacter as a character you want to search for. For example:
... LIKE 'B$%'
(without an
ESCAPE
clause) means “like the letter B followed by a dollar sign followed by anything at all”, while:... LIKE 'B$?%' ESCAPE '?'
means “like the letter B followed by a dollar sign followed by a percent sign” (since % is preceded by the escape character it has no special significance in this pattern). Your escape character can also be followed by itself in the pattern, if you want to search for the escape character. For example:
... LIKE 'B$??' ESCAPE '?'
means “like the letter B followed by a dollar sign followed by a question mark” (since ? is preceded by the escape character it has no special significance in this pattern). Your best choice for an escape character is an SQL special character which isn’t a
[NOT] LIKE
metacharacter. We suggest the question mark.
The escape_character
must be exactly one character long. If it isn’t,
[NOT] LIKE
will fail: your DBMS will return the SQLSTATE error 22019
"data exception-invalid escape character"
. If escape_character
is _ or %
and that metacharacter is used once only in your pattern, or if
escape_character
is used without being followed by a metacharacter (or by
itself) in your pattern, [NOT] LIKE
will fail: your DBMS will return the
SQLSTATE error 22025 "data exception-invalid escape sequence"
. For example,
this predicate will result in SQLSTATE 22025
:
LIKE 'B%B' ESCAPE '%'
For the purposes of [NOT] LIKE
, a substring of
character_string_argument
is a sequence of zero or more contiguous
characters, where each character belongs to exactly one such substring (this
includes any trailing spaces in the argument). A substring specifier of
pattern
is either (a) _: an arbitrary character specifier, (b) %: an
arbitrary string specifier, (c) escape_character
followed by _ or % or
escape_character
or (d) any other single character. If
character_string_argument
and pattern
are both variable length
character strings with a length of zero, LIKE
returns TRUE
. LIKE
also returns TRUE
if “pattern” is found in character_string_argument
.
That is, LIKE
returns TRUE
only if the number of substrings in
character_string_argument
equals the number of substring specifiers in
pattern
and all of these conditions are also met:
If the pattern’s n-th substring specifier is _, then the argument’s n-th substring must be any single character.
If the pattern’s n-th substring specifier is %, then the argument’s n-th substring must be any sequence of zero or more characters.
If the pattern’s n-th substring specifier is any other character, then the argument’s n-th substring must be equal (in length and character representation) to that substring specifier – without trailing spaces being added to the argument. Note that this means if the pattern is found in the argument, but the lengths don’t match,
LIKE
returnsFALSE
. For example, these four predicates all returnTRUE
:'bob' LIKE 'b_b' 'bob' LIKE 'b%b' 'bob ' LIKE 'b_b ' 'bob ' LIKE 'b%b '
But these two predicates return
FALSE
because of the trailing spaces incharacter_string_argument
that aren’t found in pattern:'bob ' LIKE 'b_b' 'bob ' LIKE 'b%b'
And these two predicates return
FALSE
because of the trailing spaces inpattern
that aren’t found incharacter_string_argument
:'bob' LIKE 'b_b ' 'bob' LIKE 'b%b '
Note that this is only a problem with fixed length character string arguments. Here’s a more complete example:
CREATE TABLE Test_Stuffs (
column_1 CHAR(4));
INSERT INTO Test_Stuffs (column_1)
VALUES ('ABC');
-- actually inserts 'ABC ' (four characters)
SELECT *
FROM Test_Stuffs
WHERE column_1 = 'ABC';
-- works because comparisons will pad the shorter argument (assuming the relevant Collation has the PAD SPACE attribute) so the test is WHERE 'ABC ' = 'ABC '
SELECT *
FROM Test_Stuffs
WHERE column_1 LIKE '%C';
-- fails because LIKE never pads the shorter argument, no matter what Collation is used, so the test is "find a value of any length that ends in C" -- and 'ABC ' ends in a space, not in C
To get around this, use TRIM
to get rid of trailing spaces in your
character_string_argument
, like this:
SELECT *
FROM Test_Stuffs
WHERE TRIM (TRAILING FROM column_1) LIKE '%C';
[Obscure Rule] The result of [NOT] LIKE
belongs to the same Character set
that its operands do. If you omit the ESCAPE
clause, then it has a
Collation determined by Table 7-3 “Collating Sequences used for Comparisons”,
where character_string_argument
is comparand 1 and pattern
is comparand
2. If you include the ESCAPE
clause, then it also has a Collation
determined by Table 7-3 “Collating Sequences used for Comparisons”, where
comparand 1 is determined by Table 7-2 “Collating Sequences and Coercibility Rules for Dyadic Operations”, where
character_string_argument
is operand 1 andpattern
is operand 2.comparand 2 is
escape_character
.
[Obscure Rule] [NOT] LIKE
can also operate on BLOBs
. We’ve ignored this
option for now – look for it in our chapter on BLOBs.
If you want to restrict your code to Core SQL, don’t use the [NOT] LIKE
predicate with CLOBs
or NCLOBs
and, when you do use [NOT LIKE]
,
make sure your character_string_argument
is a <Column reference> and that
your pattern
and your escape_character
are both <value specification>s.
<similar predicate>¶
The required syntax for a <similar predicate> is as follows.
<similar predicate> ::=
character_string_argument [ NOT ] SIMILAR TO pattern
[ ESCAPE escape_character ]
SIMILAR
is a predicate that operates on three operands that evaluate to
character strings belonging to the same Character set. It works much like
Unix’s grep: it searches for values that contain a given pattern. NOT
SIMILAR
is the converse and lets you search for values that don’t contain a
given pattern. The character_string_argument
is the character string you’re
searching within, the pattern
is the pattern you’re searching for and the
optional escape_character
is a character that tells your DBMS to treat a
metacharacter in the pattern as itself (rather than as a metacharacter). If
character_string_argument
contains the pattern, SIMILAR
returns
TRUE
and NOT SIMILAR
returns FALSE
. If
character_string_argument
does not contain the pattern, SIMILAR returns
``FALSE
and NOT SIMILAR
returns TRUE
. If any of the operands are
NULL
, SIMILAR
and NOT SIMILAR
return UNKNOWN
.
The pattern you specify in pattern
must be a regular expression: a sequence
of ordinary characters combined with some special characters (or
metacharacters). It may contain character ranges, repetitions and combinations.
Any single character in pattern
that is not a metacharacter or the
escape_character
represents itself in the pattern. For example, this
predicate,
char_column SIMILAR TO 'A'
is TRUE
for 'A'
.
Special significance is attached to metacharacters in a pattern. The metacharacters are: _ and % and * and + and | and ( and ) and [ and ] and ^ and - and :. If the predicate doesn’t include an ESCAPE clause, they are interpreted as follows:
_ An underline character means “any single character”. For example, this predicate:
char_column SIMILAR TO 'A_C'
is
TRUE
for'A C'
,'AAC'
,'ABC'
,'A#C'
and so on.% A percent sign means “any string of zero or more characters”. For example, this predicate:
char_column SIMILAR TO 'A%C'
is
TRUE
for'AC'
,'A C'
,'AxC'
,'AxxxxxxxxC'
and so on.“*” An asterisk means “preceding repeats indefinitely” (from zero to infinity times). For example, this predicate:
char_column SIMILAR TO 'A*'
is
TRUE
for'A'
,'AA'
,'AAA'
,'AAAA'
and so on.“+” A plus sign means “preceding repeats indefinitely” (from one to infinity times). For example, this predicate:
char_column SIMILAR TO 'A+'
is
TRUE
for'A'
,'AA'
,'AAA'
,'AAAA'
and so on.[ ] Brackets are used for character enumeration in the pattern. There are two ways to enumerate: as a simple list or with a minus sign, with the result that a match is made with any one of the characters inside the brackets. For example, this predicate:
char_column SIMILAR TO '[A]'
is
TRUE
for'A'
. This predicatechar_column SIMILAR TO '[AQZ]'
is
TRUE
for'A'
or'Q'
or'Z'
. This predicate:char_column SIMILAR TO '[A-E]'
is
TRUE
for'A'
or'B'
or'C'
or'D'
or'E'
. And this predicate:char_column SIMILAR TO '[A-EQ-S]'
is
TRUE
for'A'
or'B'
or'C'
or'D'
or'E'
or'Q'
or'R'
or'S'
.[^ ] A circumflex inside enumerating brackets means negative enumeration. The options are the same as for ordinary enumeration, with a negated meaning. For example, this predicate:
char_column SIMILAR TO '[^A-C]'
is
TRUE
for anything not equal to'A'
or to'B'
or to'C'
. This predicate:char_column SIMILAR TO '[^AQZ]'
is
TRUE
for anything not equal to'A'
or'Q'
or'Z'
. And this predicate:'ABCDE' SIMILAR TO '[^C-F]'
is
FALSE
, since the last character in the character string argument must not be'C'
or'D'
or'E'
or'F'
.[: :] Brackets containing colons surrounding one of:
ALPHA
,UPPER
,LOWER
,DIGIT
orALNUM
are used for set enumeration in the pattern. For example, this predicate:char_column SIMILAR TO '[:ALPHA:]'
is
TRUE
for values ofchar_column
that are equal to any simple Latin letter, i.e.: to any ofABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
. This predicate:char_column SIMILAR TO '[:UPPER:']
is
TRUE
for values ofchar_column
that are equal to any simple Latin upper case letter, i.e.: to any ofABCDEFGHIJKLMNOPQRSTUVWXYZ
. This predicate:char_column SIMILAR TO '[:LOWER:]'
is
TRUE
for values ofchar_column
that are equal to any simple Latin lower case letter, i.e.: to any ofabcdefghijklmnopqrstuvwxyzv
. This predicate:char_column SIMILAR TO '[:DIGIT:]'
is
TRUE
for values ofchar_column
that are equal to any digit, i.e.: to any of0123456789
. And this predicate:char_column SIMILAR TO '[:ALNUM:]'
is
TRUE
for values ofchar_column
that are equal to any simple Latin letter or to any digit, i.e.: to any ofABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
.| The vertical bar means “the logical OR of the first and second expressions”. For example, this predicate,
char_column SIMILAR TO '[A-C']|[:DIGIT:]'
is
TRUE
for any ofABC0123456789
. The | operator has a lower priority than * and + have.|| The concatenation operator means “concatenate one element from first expression with one element from second expression”. For example, this predicate:
char_column SIMILAR TO '[A-C]||[:DIGIT:]'
is
TRUE
for:'A0'
,'A1'
,'A2'
,'A3'
,'A4'
,'A5'
,'A6'
,'A7'
,'A8'
,'A9'
,'B0'
,'B1'
,'B2'
,'B3'
,'B4'
,'B5'
,'B6'
,'B7'
,'B8'
,'B9'
,'C0'
,'C1'
,'C2'
,'C3'
,'C4'
,'C5'
,'C6'
,'C7'
,'C8'
and'C9'
.( ) Parentheses in a pattern force the order of evaluation, in the usual way. For example, this predicate:
char_column SIMILAR TO '[:UPPER:]|([:DIGIT:][:DIGIT:])'
is
TRUE
for any single upper case letter, or for any two digits.If
pattern
is not a valid pattern,[NOT] SIMILAR
will fail: your DBMS will return theSQLSTATE error 2201B "data exception-invalid regular expression"
. Here are two examples of invalid patterns:'^[:UPER:]' '[:abc:]'
If you want to search for a character that would normally be interpreted as a
metacharacter, you must use the optional ESCAPE
clause. To do so:
Pick a character that you won’t need in the pattern and designate it as your escape character.
In the pattern, use your escape character followed immediately by the metacharacter, to designate it as a character you want to search for. For example:
... SIMILAR TO 'B$%'
(without an
ESCAPE
clause) means “similar to the letter B followed by a dollar sign followed by anything at all”, while:... SIMILAR TO 'B$?%' ESCAPE '?'
means “similar to the letter B followed by a dollar sign followed by a percent sign” (since % is preceded by the escape character it has no special significance in this pattern). Your escape character can also be followed by itself in the pattern, if you want to search for the escape character. For example:
... SIMILAR TO 'B$??' ESCAPE '?'
means “similar to the letter B followed by a dollar sign followed by a question mark” (since ? is preceded by the escape character it has no special significance in this pattern). Your best choice for an escape character is an SQL special character which isn’t a
[NOT] SIMILAR
metacharacter. We suggest the question mark.
The escape_character
must be exactly one character long. If it isn’t,
[NOT] SIMILAR
will fail: your DBMS will return the SQLSTATE error 22019
"data exception-invalid escape character"
. If escape_character
is [ or ]
or ( or ) or | or ^ or - or + or * or _ or % and that metacharacter is used
once only in your pattern, or if escape_character
is used without being
followed by a metacharacter (or itself) in your pattern, [NOT] SIMILAR
will
fail: your DBMS will return the SQLSTATE error 2200C "data exception-invalid
use of escape character"
. For example, this predicate will result in
SQLSTATE 2200C
:
SIMILAR TO 'B?B' ESCAPE '?'
If escape_character
is a colon and your pattern contains that metacharacter
surrounding one of: ALPHA
, UPPER
, LOWER
, DIGIT
or ALNUM
,
[NOT] SIMILAR
will fail: your DBMS will return the SQLSTATE error 2200B
"data exception-escape character conflict"
.
[Obscure Rule] The result of [NOT] SIMILAR
belongs to the same Character set
that its operands do. If you omit the ESCAPE
clause, then it has a Collation
determined by Table 7-3 “Collating Sequences used for Comparisons”, where
character_string_argument
is comparand 1 and pattern
is comparand 2. If
you include the ESCAPE
clause, then it also has a Collation determined by
Table 7-3 “Collating Sequences used for Comparisons”, where:
comparand 1 is determined by Table 7-2 “Collating Sequences and Coercibility Rules for Dyadic Operations”, where
character_string_argument
is operand 1 andpattern
is operand 2.comparand 2 is “escape_character”.
If you want to restrict your code to Core SQL, don’t use the [NOT] SIMILAR
predicate.
Common checks¶
Although [NOT] SIMILAR
is not terribly useful in WHERE
clauses (it’s
too inefficient) it is great for CHECK
clauses. Here are some real-world
examples of strings which have rigid format specifications. [NOT] SIMILAR
is appropriate for making sure the strings meet the specifications.
Postal Codes
These strings must be “letter digit letter space digit letter digit” – and the
letters must be upper case simple Latin letters, e.g.: 'T5E 1G7'
, 'V1K
4K0'
. To make sure your data fits these requirements, use a simple Domain
Constraint:
ALTER DOMAIN postal_code
ADD CONSTRAINT postal_code_specs
CHECK (VALUE SIMILAR TO
'[:UPPER:][DIGIT:][:UPPER] [:DIGIT]:[UPPER:][:DIGIT:]');
Periodic Table Symbols
These strings are either a single upper case simple Latin letter capital letter (e.g.: ‘H’, ‘O’) or one upper case and one lower case letter (E.G.: ‘Al’, ‘Fe’). To make sure your data fits these requirements, use another simple Domain Constraint:
CREATE DOMAIN periodic_table_element CHAR(2)
CHECK (VALUE SIMILAR TO '[A-Z]|([A-Z][a-z])');
North American Telephone Numbers
These strings must be “digit digit digit minus-sign digit digit digit digit”; optionally preceded by “left-parenthesis digit digit digit right-parenthesis” (e.g.: ‘498-1234’ or ‘(604)498-1234’). This is a hard one: the string includes both an optional format and a special character that needs “escaping”. To make sure your data fits these requirements, use a Table Constraint:
CREATE TABLE Table_1 (
phone_number CHAR(13),
CHECK (phone_number SIMILAR TO
'([:DIGIT:][:DIGIT:][:DIGIT:]?-[:DIGIT:][:DIGIT:][:DIGIT:][:DIGIT:])
|
(?([:DIGIT:][:DIGIT:][:DIGIT:]?)[:DIGIT:][:DIGIT:][:DIGIT:]?-
[:DIGIT:][:DIGIT:][:DIGIT:][:DIGIT:])' ESCAPE '?');
(This example is shown on multiple lines for clarity; in reality the string may not contain carriage returns, nor would the “pattern”).
[NOT] LIKE
<B+>or [NOT] SIMILAR
?
[NOT] LIKE
and [NOT] SIMILAR
are both pattern-matching predicates. You
should continue to use [NOT] LIKE
if the pattern contains only _ and %
wildcards – although [NOT] SIMILAR
can use these wildcards too, there is
no advantage in using an SQL3 expression when an SQL-92 expression will do. For
more complex patterns, your choice is between [NOT] SIMILAR
and nothing.
When you make the switchover, remember that there are subtle differences
between [NOT] SIMILAR
and [NOT] LIKE
, in the way that collating
sequences are handled (which affects which characters are regarded as “equal”
and whether there are pad spaces at the end of a string).
Character Strings and Character Sets¶
[Obscure Rule] applies for this entire section.
In the last chapter, we made the observation that a computer character set has two parts: a character repertoire and an agreement on how the repertoire’s characters will be encoded as numbers. SQL has a similar rule: An SQL Character set is a combination of two things:
A character repertoire: the set of characters that belong to the Character set.
A Form-of-use: the repertoire’s encoding scheme – the one-to-one mapping scheme between each character in the repertoire and a set of internal codes (usually 8-bit values) that define how the repertoire’s characters are encoded as numbers. (These codes are also used to specify the order of the characters within the repertoire.)
All SQL character strings belong to some Character set. Whenever you’re working with an SQL character string, you may either specify the Character set it belongs to, or allow it to belong to a default Character set chosen by your DBMS. (To simplify matters, we recommend that you always follow the latter course. This will ensure that you get standard results across SQL-sessions.)
To explicitly specify a Character set for a character string, add a CHARACTER
SET
clause to a <data type> specification and/or _<Character set name> to a
<literal>, as shown in the appropriate syntax diagrams in this chapter. Your
current <AuthorizationID> must have the USAGE Privilege for the Character set
named.
If you choose not to specify a Character set for a character string, the current default Character set is implicit. Your DBMS will choose the current default Character set using these rules:
A character string <data type> specification (in
CREATE SCHEMA
,CREATE TABLE
,CREATE DOMAIN
,ALTER TABLE
andALTER DOMAIN
) that doesn’t include an explicit CHARACTER SET clause is treated as if the default Character set of the Schema it’s defined in was explicitly named.[NON-PORTABLE] In any operation other than defining a Domain, defining a Column or defining a Field (e.g.: in a
CAST
operation), a character string <data type> specification that doesn’t include aCHARACTER SET
clause will be treated as if it belongs to a Character set that is non-standard because the SQL Standard requires implementors to define what the operation’s default Character set is. [OCELOT Implementation] The OCELOT DBMS that comes with this book usesISO8BIT
– the DBMS’s initial default Character set – as the default Character set for such operations.Any other character string value that doesn’t include an explicit Character set specification must either consist only of <SQL language character>s or the value’s Character set defaults to (a) the default Character set of the Schema, if it’s found in a
CREATE SCHEMA
statement, (b) the default Character set of the SQL-session, if it’s found in a dynamic SQL statement or (c) the default Character set of the Module you’re running, if it’s found inany other SQL statement in a Module.
Every Character set has at least one Collation – its default Collation. You may define additional Collations for any Character set.
If you want to restrict your code to Core SQL, don’t explicitly define the Character set that any character string belongs to – always allow it to belong to the default Character set.
Character Strings and Collations¶
[Obscure Rule] applies for this entire section.
A Collation, or collating sequence, is a set of rules that determines the result when character strings are compared. The result of any character string comparison thus depends on the Collation used – we’ll call this the relevant Collation. Different Collations might result in different comparison results for the same two strings, e.g.: a case-sensitive Collation will determine that the letter “A” and the letter “a” are not equal, but a case-insensitive Collation will determine that “A” and “a” are equal.
Whenever you’re comparing an SQL character string, you may either specify the relevant Collation, or allow the comparison to be governed by a default Collation chosen by your DBMS. (To simplify matters, we recommend that you always follow the latter course. This will ensure that you get standard results across SQL-sessions.)
To explicitly specify a Collation for a comparison, add a COLLATE
clause to
your character string, as shown in the appropriate syntax diagrams in chapter.
The Collation you name must either (a) be the default Collation for the
relevant Character set or (b) be defined as a Collation for the relevant
Character set by some CREATE COLLATION
statement. If you’re using COLLATE
in an SQL-Schema statement, then the <AuthorizationID> that owns the containing
Schema must have the USAGE Privilege on “<Collation name>”. If you’re using
COLLATE
in any other SQL statement, then your current <AuthorizationID>
must have the USAGE
Privilege on “<Collation name>”.
If you choose not to specify a Collation for a comparison, the current default Collation is implicit. Your DBMS will choose the current default Collation for a character string using these rules.
First, to choose a character string’s default Collation:
If a character string <data type> specification doesn’t include a
COLLATE
clause but does include aCHARACTER SET
clause, the default Collation for that <data type>’s values is the default Collation of the Character set named.If any other character string value doesn’t include a
COLLATE
clause, the default Collation for that value is the default Collation of the value’s Character set.
Second, to choose one of the comparand’s default Collations for the comparison:
Expressions that involve only non-Columns (i.e.: a <literal>, host language variable, parameter or expression result) are compared using the default Collation for the character string values’ mutual Character set.
Expressions that involve both Columns (i.e.: <Column name>s or <Column reference>s) and non-Columns are compared using the Column(s)’ mutual default Collation. If you want to compare values from multiple Columns with different default Collations, you must include a
COLLATE
clause in your expression.
Table 7-1 shows how the collating sequence and coercibility attribute are determined for the result of a monadic operation.
Table 7.1 Collating sequences and coercibility rules for monadic operations
OPERAND’S COERCIBILITY ATRIBUTE |
OPERAND’S COLLATION |
RESULT’S COERCIBILITY ATTRIBUTE |
RESULT’S COLLATION |
---|---|---|---|
Coercible |
Default |
Coercible |
Default |
Implicit |
X |
Implicit |
X |
No Collation |
None |
No Collation |
None |
Explicit |
X |
Explicit |
X |
Table 7-2 shows how the collating sequence and coercibility attribute are determined for the result of a dyadic operation.
Table 7-3 shows how the collating sequence is determined for a particular comparison.
OPERAND_1’S COERCIBILITY ATRIBUTE |
OPERAND_1’S COLLATION |
OPERAND_2’S COERCIBILITY ATTRIBUTE |
OPERAND_2’S COLLATION |
RESULT’S COERBILITY ATRIBUTE |
RESULT’S COLLATION |
---|---|---|---|---|---|
Coercible |
Default |
Coercible |
Default |
Coercible |
Default |
Coercible |
Default |
Implicit |
X |
Implicit |
X |
Coercible |
Default |
No Collation |
None |
No Collation |
None |
Coercible |
Default |
Explicit |
X |
Explicit |
X |
Implicit |
X |
Coercible |
Default |
Implicit |
X |
Implicit |
X |
Implicit |
X |
Implicit |
X |
Implicit |
X |
Implicit |
Y<>X |
No Collation |
None |
Implicit |
X |
No Collation |
None |
No Collation |
None |
Implicit |
X |
Explicit |
Y |
Explicit |
Y |
No Collation |
None |
Coercible |
Default |
No Collation |
None |
No Collation |
None |
Implicit |
X |
No Collation |
None |
No Collation |
None |
No Collation |
None |
No Collation |
None |
No Collation |
None |
Explicit |
X |
Explicit |
X |
Explicit |
X |
Coercible |
Default |
Explicit |
X |
Explicit |
X |
Implicit |
Y |
Explicit |
X |
Explicit |
X |
No Collation |
None |
Explicit |
X |
Explicit |
X |
Explicit |
X |
Explicit |
X |
Explicit |
X |
Explicit |
Y<>X |
invalid syntax |
Table 7.3 shows how the collation sequence is determined for a particular comparison.
Table 7,3 Collating sequences used for comparison
OPERAND_1´S COERCIBILITY ATRIBUTE |
OPERAND_1´S COLLATION |
OPERAND_2´S COERCIBILITY ATTRIBUTE |
OPERAND_2´S COLLATION |
COLLATION USE FOR COMPARISON |
---|---|---|---|---|
Coercible |
Default |
Coercible |
Default |
Default |
Coercible |
Default |
Implicit |
X |
X |
Coercible |
Default |
No Collation |
None |
invalid syntax |
Coercible |
Default |
Explicit |
X |
X |
Implicit |
X |
Coercible |
Default |
X |
Implicit |
X |
Implicit |
X |
X |
Implicit |
X |
Implicit |
Y<>X |
invalid syntax |
Implicit |
X |
No Collation |
None |
invalid syntax |
Implicit |
X |
Explicit |
Y |
Y |
No Collation |
None |
Coercible |
Default |
invalid syntax |
No Collation |
None |
Implicit |
X |
invalid syntax |
No Collation |
None |
No Collation |
None |
invalid syntax |
No Collation |
None |
Explicit |
X |
X |
Explicit |
X |
Coercible |
Default |
X |
Explicit |
X |
Implicit |
Y |
X |
Explicit |
X |
No Collation |
None |
X |
Explicit |
X |
Explicit |
X |
X |
Explicit |
X |
Explicit |
Y<>X |
invalid syntax |
Note
For expressions involving more than two comparands, the collating sequnce is
effectively determined on a cumulative basis; the result for the first two
comparands becomes comparand_1
for the next comparison, the result for
this becomes comparand_1
for the comparison after that, and so on.
Dialects¶
The “typical” SQL DBMS supports most of the standard character data types, but
often uses preferred local names. For example, Oracle has a 2000-byte maximum
(2048 for Oracle Lite) for the CHAR
<data type> and offers a (non-standard)
LONG VARCHAR
type to define larger character string fields.
Sybase allows for a large variety of Character sets, with only one “group”
(Character set) allowed at a time. Baltic languages are in the East European
group (8859-2), though it doesn’t seem possible to get a correct result in this
case. Collations supported are: English+French+German (all together!), Spanish,
Hungarian, Russian; then everything else is binary. Sybase does not support SQL
CHARACTER SETs
, COLLATIONs
or TRANSLATIONs
, nor does it support
CONVERT
– to convert you need an offline utility. It does provide some
Unicode support.