Chapter 2 – General Concepts¶
Note
You are reading a digital copy of SQL-99 Complete, Really, a book that documents the SQL-99 standard.
For more information specific to CrateDB, check out the CrateDB Reference.
A database system can be described as essentially nothing more than a computerized record-keeping system. A database, then, is simply a collection of structured data files and any associated indexes. The user of such a system must be able to add, insert, retrieve, update, and delete data and files as necessary. Although the SQL Standard doesn’t actually define the nebulous concept “database”, SQL provides all of these functions and more.
In this chapter, we’ll briefly discuss SQL’s fundamental concepts – how the language fits into its overall environment, the data Objects you can expect to work with, and how SQL-data and SQL statements are structured. Then, in subsequent chapters, we’ll revisit each of these areas in greater detail.
Table of Contents
Set Theory¶
Georg Cantor was a German.
He invented Set Theory.
He was committed to a mental institution.
He died in 1918.
We can explain the preceding statements using Georg Cantor’s own theory: a set is any collection of definite distinguishable things. We can conceive of the set as a whole, and in fact we often do: for example, we speak of “the Germans” (a set) and can rephrase our first statement as “Georg Cantor was a member (or element) of the set of Germans”. By rephrasing, we emphasize the collection of individual things over the individual things themselves. That much is intuitive. But Cantor was careful in his choice of words. By “distinguishable” (or distinct), he meant that in looking at any two things which fit in the set, we must be able to decide whether they are different. By “definite”, he meant that if we know what the set is and we know what the thing is, we can decide whether the thing is a member of the set. Therefore, to know what a set “is”, it is sufficient to know what the members are.
Here are a few examples. The “Germans” set also included Kaiser Wilhelm. However, it can be proved from historical records that Cantor was not a pseudonym or alias that the Kaiser used while off duty – therefore, the two members are distinguishable. At the same time, we know that there were several million other Germans, also distinguishable, and we could define the set by taking a census of all the Germans. There might be some difficulty deciding the individual question “What is a German?”, but once that was done there would be no difficulty deciding the collective question “What are the Germans?”. Therefore, the members define the set, i.e., the members are definite.
The census we spoke of is possible because the Germans were a finite set (a fact which would have bored Cantor because he developed his theory to explain various gradations of infinity). We could enumerate the set thus:
{Georg Cantor, Kaiser Wilhelm, ...}
In this enumeration, we used braces to indicate “we are enumerating a set” and an ellipsis to indicate “and so on” – that is, the list could go on but we felt it unnecessary to continue for the sake of our exposition. These are standard conventions and we will use braces and ellipses again.
Enumeration is unwieldy for large sets, so let us revisit the question “What is a German?” by taking the bounds stated in the song “Deutschland Ueber Alles” – a German is a person living between the Maas, the Memel, the Esch and the Belt (four bodies of water which now lie respectively in Holland, Russia, Austria, and Denmark). In Cantor’s terms, that formula expresses a defining property. It is either true or it is false. If it is true, the person is in the set. If it is false, the person is outside the set.
Without stating the defining property in advance, the German census-takers would be unable to put together their forms and plan their information collection. The objective, though, is to produce an enumeration of all Germans. In computer terminology, we call the definition the database design and the enumeration the database itself. The “Germans” set can be broken up into several subsets such as:
{Berliners, Frankfurters, Hamburgers, ...}
These subsets are also sets, with defining properties (city of residence), and presumably, with members – but that is not necessary. For example, the set of Germans who live in the Rhine River is an empty set, i.e., it has no members, but it is still a set. The implicit breaking-up that happens when we ask “Among the Germans which ones are Frankfurters?” is an example of a set operation. A set operation is something we do with sets that results in the production of more sets.
Relations¶
First, let’s consider first a binary relation – that is, a relation between two things. The things don’t have to be of the same type; all we are concerned with is that they have some type of bond, and that they be in order. Getting back to our hero … there is a relationship between Georg Cantor and the concept Set Theory (he invented it). There is also a relationship between Kaiser Wilhelm and World War I (he started it). We could use our braces notation to enumerate this:
{ (Georg Cantor, Set Theory), {Kaiser Wilhelm, World War I) }
but it looks clearer when diagrammed as a two-dimensional Table:
|
|
Georg Cantor |
Set Theory |
Kaiser Wilhelm |
World War I |
There are some points to note about this diagram.
The Table shows ordered pairs – we couldn’t reverse them because Set Theory didn’t invent Georg Cantor and World War I didn’t cause Kaiser Wilhelm – there lationship between the
NAME
andACTIVITY
values is directional. Note, however, that the word “ordered” refers only to the horizontal order in the illustration – across the relation. We don’t care which member of the set is listed first.The Table shows binary pairs – there is no space on the line for marking Georg Cantor’s other achievements. Under “What did he do?” we can only express one thing. So: ordered means ordered, and pair means pair.
So what, precisely, is the “relation” here? Well, it’s the whole thing. The relationship is the set of all the ordered pairs, and the ordering itself (i.e., how part A relates to part B). What we have in the preceding diagram is a picture of a relation and nothing but a relation.
This relation is a set. It has members. The members define the set. However, the members are no longer “elements”, but ordered pairs. There’s no reason to limit ourselves to ordered pairs, though. That’s just the simplest relation, the binary relation, which is sometimes called “a relation of degree 2” (because there are two columns). We could have relations of degree 3, degree 4, degree 5, degree 6, degree 7, and so on, i.e., relations which are not binary but triple, quadruple, pentuple, sextuple, septuple … Did you notice how after a while all the words ended in “-tuple”? That’s why the general term for a relation with n elements is n-tuple. Here is a relation of degree 4 showing all the information we have so far:
|
|
|
|
Georg Cantor |
Set Theory |
Mental Institution |
1918 |
Kaiser Wilhelm |
World War |
I Imperial Palace |
??? |
...
Some differences now appear between the words Cantor used (“set theory terminology”) and the words we use (“database terminology”). The line:
{Georg Cantor, Set Theory,Mental Institution,1918}
would be one tuple to Cantor, but in SQL, we prefer the word row. Or, to be precise: the row value is the four-element:
{Georg Cantor,Set Theory,Mental Institution,1918)
and the row is the box that holds that information. (We don’t usually need to
be so precise in ordinary discussion.) Meanwhile, going down rather than
across, Cantor would say that we have four attributes, labelled NAME
,
ACTIVITY
, RESIDENCE
and DATE_OF_DEATH
. But in SQL, we prefer the
word column instead and we call each element going down (such as ‘Georg Cantor’
or ‘Kaiser Wilhelm’) a column value.
Here’s a quick summary. Moving across the relation are tuples (but don’t use that word) or rows. Moving up-and-down the relation are attributes (but don’t use that word) or columns. The contents of a row is a row value. The intersection of a row and a column is a column value. The column value is “atomic” – it has only one element. There is always exactly one column value in each atomic box that’s formed by the intersection of a row with a column.
Incidentally, in the diagram we used an ellipsis once more, to mean “and so on” – there are more Germans in the set. We also used the symbol “???” for the column value of “When did he die?” for Kaiser Wilhelm. This is not a standard symbol – there is no standard way of expressing the fact that not only do we not know when Kaiser Wilhelm died (i.e., “value is Unknown”), but we’re not even sure that he’s dead (i.e., “category is Not Applicable”). This is not a problem for Set Theory, but it is a problem in practice and we’ll return to it later. When we do, we’ll call the “???” a NULL, or null value.
And there we have it. A relation is an ordered n-tuple set, with all members having the same degree, which is representable as a table of rows and columns. It really does seem appropriate that we should know what a relation is, because (we trust the following is not an unpleasant surprise) SQL is a relational database management system, and that means the database is made of relations. To quote the Kellogg’s Rice Krispies (tm) commercial, “What the heck did you think they were made of?”
Admittedly, when anything looks that obvious, it must be a lie. And it is. Relational databases are actually made of tables, rather than relations. What’s the difference? Well, with a table we can have two Georg Cantors. This clearly breaks Cantor’s “distinguishable” rule, and it’s quite a big break; we all know that the famous “Set Of Integers” doesn’t go
{1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,...}
By definition, a set has no duplicates. A relation is a set, so a relation has no duplicates. But a table can have duplicates. This is a concession to practicalities, since we know that duplicate data happens in the “real world”. However, if our database contains two Georg Cantors who really are the same person, then we have an error – while if it contains two Georg Cantors whom we can’t distinguish from one another, then we have a design flaw. The long and the short of it all is: (a) a relational database consists of tables, (b) a table is not a relation, but the only difference between them is that a table may have rows with duplicate row values, (c) you should get rid of duplicates regardless, therefore (d) therefore all relational databases should consist of relations. (By the way, a table is sometimes called a multiset to distinguish it from a regular set. You’ll see the word multiset in some Microsoft publications.)
Set Operations¶
Since SQL operates on sets, there must be such things as “set operations” that
SQL can perform. In fact, SQL borrows all the standard operations from Set
Theory textbooks, along with the terminology. Mercifully, it does not also
borrow the notation. So to describe the set operations in this book, we’ve been
able to use some English-looking terms which, in fact, have fairly precise
meanings in Set Theory, and some English-looking terms which, in fact, are SQL
words for our notation. In the following quick introduction, we will describe a
sequence of set operations. A sequence, also known as a series, is an
ordered set. The order that we’ll follow is the order of execution of the
clauses of the most famous of SQL statement types: the query, or SELECT
statement.
Identy¶
The easiest operation is take a copy of the table we started with:
|
|
Georg Cantor |
Set Theory |
Kaiser Wilhelm |
World War I |
The SQL for this is:
... FROM Germans ...
Because this is the first time we’ve said something “in SQL”, we’ll regard it
as our equivalent of the famous “hello world” – so, some introductions are in
order. We’ve used the ellipsis before. As usual, it means “and so on”, so you
can see that there’s some stuff both before and after FROM GERMANS
that
we’re not going to spell out just here. We need the ellipsis to indicate that
we’re not illustrating a whole statement, merely one clause, called the
FROM
clause. The word FROM is an SQL keyword and the word Germans is an SQL
Table name. So: the FROM
clause takes the Germans table and ends up with a
result table that is the same as the Germans table.
Product¶
In the history of the world, there have been four great Mathematicians:
{Al-Khwarezm, Georg Cantor, Leonhard Euler, Rene Descartes}
If we put that side-by-side with our original Germans set, we get:
|
|
---|---|
Al-Khwarezm |
Georg Cantor |
Georg Cantor |
Kaiser Wilhelm |
Leonhard Euler |
|
Rene Descartes |
The Cartesian product operation yields the set of all pairs (x,y) such that x
is a member of some set X and y is a member of some set Y. The result of
GERMANS
[Cartesian Product] MATHEMATICIANS
is the following binary
relation.
|
|
|
|
Al-Khwarezm |
Georg Cantor |
Georg Cantor |
Georg Cantor |
Leonhard Euler |
Georg Cantor |
Rene Descartes |
Georg Cantor |
Al-Khwarezm |
Kaiser Wilhelm |
Georg Cantor |
Kaiser Wilhelm |
Leonhard Euler |
Kaiser Wilhelm |
Rene Descartes |
Kaiser Wilhelm |
You must not object that Al-Khwarezm is unrelated to Kaiser Wilhelm, because the spirit of a Cartesian-product relation is that everything relates to everything. The table above is a relation – it’s mathematically valid (even if it doesn’t seem to make any sense!). There are several ways to express this operation in SQL. The classic style is:
... FROM Germans, Mathematicians ...
That is, within a FROM
clause the expression “table-name-1 , table-name-2”
means “yield a table which is the Cartesian product of table-name-1 and
table-name-2”.
Search condition¶
In our Cartesian-product relation, there is something special about the row:
(Georg Cantor,Georg Cantor)
That row, and that row only, has two column values that are the same. This is significant because the columns are defined in a meaningful way: one ‘Georg Cantor’ is the name of a German, while the other ‘Georg Cantor’ is the name of a mathematician. Therefore, Georg Cantor must be the only person who is both a German and a mathematician. (This assumes that names are unique.) And again, there is an SQL way to say this:
... FROM Germans, Mathematicians
WHERE Germans.name = Mathematicians.name ...
We now have two clauses in our example. The FROM
clause exploded the tables
into a Cartesian product relation. The WHERE
clause reduces the result to a
subset of that relation. The result subset is also a relation – it contains
only those rows where this condition is TRUE
: “the name equals the name”.
(This is known as a search condition in official-SQL vocabulary.) Now we have
this result:
|
|
|
|
Georg Cantor |
Georg Cantor |
So: the WHERE
clause contains a search condition, takes as input the
(table) result of the FROM
clause, and produces as output a table which
contains only those rows where the search condition is TRUE
.
Join¶
In this example so far, the FROM
clause contained two table names and the
WHERE
clause contained a comparison between columns from each table. This
two-step process is usually called a join. In modern SQL, there are several
ways to ask for a join. We have started with the oldest and best-known way,
because it best illustrates that a typical join is two separate operations.
Projection¶
Our result table now has one row with two columns, and the value in each column is ‘Georg Cantor’ – but we only need one column to answer the question “Who is German and a mathematician?”. If you think of the earlier search condition as being an operation which picks out certain rows, it’s easy to get to the next step. A projection is a complementary operation which picks out certain columns. In SQL, we just list the columns we want:
SELECT Germans.name
FROM Germans, Mathematicians
WHERE Germans.name = Mathematicians.name ...
The column reference after the keyword SELECT
is called a select list. We
now have a result table that looks like this:
|
|
Georg Cantor |
So, the select list does a projection on the table produced by the WHERE
clause. The result is (as always) a table. The words SELECT
Germans.name
FROM
Germans
, Mathematicians
WHERE
Germans.name
= Mathematicians.name
constitute a valid and complete SQL
statement. Specifically, this sort of statement is called a query, presumably
because it translates a question (in this case, “What Germans are
mathematicians?”). It’s also commonly called the SELECT
statement.
Other Set Operations¶
SQL also handles the standard set operations intersect, union, and except. The
following is an example of each: two example tables (the “input”), one result
table (the “output”), and the SQL statement that makes it happen. To make the
examples short, we’ve used unrealistically small examples with one column per
table and no WHERE
clauses, but don’t worry about syntactical details or
the exact workings of the operations – that comes later. You understand enough
so far if you grasp that some well-known set operations can be expressed in
SQL, and work on tables.
|
|
|
|
---|---|---|---|
INTEGERS_1
COLUMN_1
5
4
13
|
INTEGERS_2
COLUMN_1
33
14
4
|
SELECT column_1
FROM Integers_1
INTERSECT
SELECT column_1
FROM Integers_2
|
INTEGERS_3
COLUMN_1
4
|
STRINGS_1
COLUMN_1
Spain
Greece
Yugoslavia
|
STRINGS_2
COLUMN_1
Italy
Denmark
Belgium
|
SELECT column_1
FROM Strings_1
UNION
SELECT column_1
FROM Strings_2
|
Strings_3
COLUMN_1
Spain
Greece
Italy
Denmark
Yugoslavia
Belgium
|
DECIMALS_1
COLUMN_1
5.32
4.17
13.99
|
DECIMALS_2
COLUMN_1
33.08
14.00
4.17
|
SELECT column_1
FROM Integers_1
EXCEPT
SELECT column_1
FROM Decimals_2
|
DECIMALS_3
COLUMN_1
5.32
13.99
|
Therefore God Exists¶
Leonhard Euler is famed for his remark that “(a+b)**n/n=X, therefore God
exists”, which shows that any argument looks imposing if you use lots of math
symbols. SQL does the opposite. It has a solid base on a mathematical theory,
but the operations of Set Theory are hidden behind a somewhat English-like
sentence structure. The result is, on balance, for the good. Most SQL beginners
have an easy time grasping the concepts behind WHERE
clauses or “select
lists”, while they might have a less easy time with the dense polysymbolic
notation of standard Set Theory. Unfortunately, the SQL language hides the
operations so well that frequent delusions arise:
SQL is a “nonprocedural” language. Perhaps people get this idea from the fact that any operation on a set should affect all set members simultaneously. But the set operations themselves are ordered. One step follows another.
A whole SQL query operates on the tables in the
FROM
clause. This mode of thinking causes people to make certain common errors which could be avoided if they kept in mind the truth; that each set operation produces a new, nameless, “virtual” table and passes it on. (Well, perhaps we should add “conceptually” – there will be hundreds of times in this book that we could add “conceptually” because your DBMS may do internal things in some out-of-order sequence, provided the method doesn’t affect the results. That is not our present concern. What we must know from the outset is how a human being is supposed to view the operations.)
Coming away from this chapter, you only need to know that SQL has a specialized vocabulary which to a large extent arises from Set Theory; and that SQL operations happen on tables. You should regard this as background information. This book takes a bottom-up approach, starting with the smallest units (the column and row values), so it will be several chapters before we reach the top, and begin to emphasize the sets once more.
Recap: the Relational Model¶
A relational database is one which appears to be nothing more than a collection of tables. The model it is based on has three major aspects: the structure, the manipulation, and the integrity of data.
In the relational model, data are logically presented in two- dimensional tables made up of columns and rows. The rows of a table consist of a collection of values that describe an entity; for example, an employee. The columns of a table consist of a collection of similar data among rows; e.g., employee surnames. Operations on the data are simplified by the fact that a table’s rows are unordered. The intersection of a row and a column contains individual data items called values. Values are always atomic; that is, each position in a table may contain only one datum.
Data manipulation is provided by a set of algebraic or calculus operators.
Data integrity is provided by two rules. Entity integrity requires that every value in the primary key of a table must be a unique, non-null data value. Referential integrity requires that every value in a foreign key must either equal some value of its related primary key, or it must be NULL.
Design of a Relational Database¶
One of the main advantages of the relational model of database design is that it is based on a foundation of formal mathematical theory that allows its concepts to be defined and examined with great precision.
Remember that a relation is (essentially) a two-dimensional table consisting of horizontal rows and vertical columns. The advantage of this form is that almost everyone is familiar with data presented as a simple table. The relational model stipulates that no two rows in the table (relation) may be identical; there must be some combination of columns, called a key, whose values will uniquely identify each row.
As an example of the model’s capabilities, we will design a structure for a personnel database which contains the following information: the name of each employee, the programming languages the employee is familiar with, the number of years the employee has used each language, the employee’s title, the employee’s length of service with the company, the employee’s hourly rate of pay, the current projects to which the employee is assigned, and the manager of each of these projects.
The following PERSONNEL_1
table shown below shows a sample of the data that
might be stored in such a database, along with one possible structure.
|
|||||||
|
|
|
|
|
|
|
|
Marvin
|
Cobol
Fortran
|
3
2
|
Sr. Prog.
|
4
|
25.00
|
Payroll
A/R
|
Smith
Jones
|
Brown
|
Cobol
Basic
Ada
|
2
1
3
|
Sr. Prog.
|
3
|
24.00
|
Inventory
|
Norman
|
Norman
|
Cobol
SQL
|
4
2
|
Prj.
Mgr.
|
2
|
35.00
|
Inventory
|
Norman
|
James
|
SQL
Pascal
|
1
3
|
Sys.
Ana.
|
2
|
29.00
|
A/R
Datcom
|
Jones
Harvey
|
Jones
|
Cobol
Pascal
SQL
Basic
|
1
5
2
9
|
Prj.
Mgr.
|
8
|
42.00
|
A/R
|
Jones
|
Each employee in this table has a unique name, so NAME
may be used as the
table’s key. In practice, of course, there may be more than one way to
construct a key - social security numbers or employee numbers are other values
that might be used to uniquely identify an employee. Using this structure, any
request of the form, “Tell me something about employee E”, is easily answered.
However, it isn’t as simple to respond to such requests as “Which employees can
use language L?”; “Who is the manager of project P?”; “Display all employees
assigned to project P”; or “Change the manager of project P to employee E”. But
through a process known as normalization, the organization of the data in
PERSONNEL_1
can be changed so that it can be used more flexibly.
The first step in normalizing the design is based on the relational rule that
each column of a table may take on only a single, non-repeating (atomic) value
for each row of the table. Looking at PERSONNEL_1
, it’s easy to see that
the columns LANG
, YRS_USE
, PROJECT
and MGR
violate this rule,
because an employee may know more than one programming language and may be
assigned to more than one project at a time. By duplicating the non-repeating
values of NAME
, TITLE
, YRS_EXP
, and PAY
for each combination of
values for the repeating groups, the entire table can be represented in first
normal form. The PERSONNEL_2
table shown below is in first normal form.
|
|||||||
|
|
|
|
|
|
|
|
Marvin |
Cobol |
3 |
Sr. Prog. |
4 |
25.00 |
Payroll |
Smith |
Marvin |
Fortran |
2 |
Sr. Prog. |
4 |
25.00 |
A/R |
Jones |
Brown |
Cobol |
2 |
Sr. Prog. |
3 |
24.00 |
Inventory |
Norman |
Brown |
Basic |
1 |
Sr. Prog. |
3 |
24.00 |
Inventory |
Norman |
Brown |
Ada |
3 |
Sr. Prog. |
3 |
24.00 |
Inventory |
Norman |
Norman |
Cobol |
4 |
Prj. Mgr. |
2 |
35.00 |
Inventory |
Norman |
Norman |
SQL |
2 |
Prj. Mgr. |
2 |
35.00 |
Inventory |
Norman |
James |
SQL |
1 |
Sys. Ana. |
2 |
29.00 |
A/R |
Jones |
James |
Pascal |
3 |
Sys. Ana. |
2 |
29.00 |
Datcomm |
Harvey |
Jones |
Cobol |
1 |
Prj. Mgr. |
8 |
42.00 |
A/R |
Jones |
Jones |
Pascal |
5 |
Prj. Mgr. |
8 |
42.00 |
A/R |
Jones |
Jones |
SQL |
2 |
Prj. Mgr. |
8 |
42.00 |
A/R |
Jones |
Jones |
Basic |
9 |
Prj. Mgr. |
8 |
42.00 |
A/R |
Jones |
We can now see that NAME
is no longer sufficient to uniquely identify a row
of PERSONNEL_2
, because multiple rows may be present for an employee who
knows more than one language or is assigned to more than one project. One
solution is to create a new key from a combination of columns. NAME
,
LANG
, and PROJECT
combined would be such a key, as those three values
together uniquely identify a single row.
PERSONNEL_2
appears to be a step backward in our design. Not only does it
require more space to present the data, but responding to requests such as,
“Change employee E’s title to T”; “Add the assignment of employee E to project
P”; or “Make employee E the manager of project P” is now more difficult. This
problem is addressed by the remaining normalization steps, which are based on
the concept of dependence and the relational rule that in every row of a table,
each column must be dependent on every part of the key.
If, for each row, the value of a column C1 uniquely determines the value of a
column C2, then C2 is functionally dependent on C1. If the value in C1 limits
the possible values in C2 to a specific set, then C2 is set dependent on C1.
For example, because each employee has only one title, we may say that NAME
determines TITLE
and that TITLE
is functionally dependent on NAME
.
PROJECT
is set dependent on NAME
, since each employee is assigned to a
specific set of projects.
The columns TITLE
, YRS_EXP
and PAY
are not dependent on the entire
key (NAME
, LANG
, PROJECT
) of PERSONNEL_2
– they are dependent
on NAME
alone. To solve this, we must create a new table containing only
NAME
, TITLE
, YRS_EXP
, and PAY
. The key for this table, called
EMPLOYEES
, will be NAME
. Of the remaining columns, YRS_USE
is
determined by both NAME
and LANG
and therefore cannot be part of
EMPLOYEES
. Another table, called LANGUAGES
, must be formed by these
three columns. LANGUAGES
will have a key formed by a combination of the
columns NAME
and LANG
. Because the table contains the NAME
column
as well, it is still possible to associate an employee’s language experience
with his employee data.
Splitting a table in this way prevents the experience part of the database from
having columns that are dependent on only part of the table’s key. A first
normal form relation (atomic values in each portion of the table) that also has
no partial key dependence is said to be in second normal form. The following
tables, EMPLOYEES
and LANGUAGES
, are in second normal form.
|
|||
|
|
|
|
Marvin |
Sr. Prog. |
4 |
25.00 |
Brown |
Sr. Prog. |
3 |
24.00 |
Norman |
Prj. Mgr. |
2 |
35.00 |
James |
Sys. Ani. |
2 |
29.00 |
Jones |
Prj. Mgr. |
8 |
42.00 |
|
||
|
|
|
Marvin |
Cobol |
3 |
Marvin |
Fortran |
2 |
Brown |
Cobol |
2 |
Brown |
Basic |
1 |
Brown |
Ada |
3 |
Norman |
Cobol |
4 |
Norman |
SQL |
2 |
James |
SQL |
1 |
James |
Pascal |
3 |
Jones |
Cobol |
1 |
Jones |
Pascal |
5 |
Jones |
SQL |
2 |
Jones |
Basic |
9 |
The situation with project assignments is slightly different. We have already
noted that an employee name determines the set of projects on which that
employee works. This is independent of the languages used by the employee. This
means that a table containing PROJECT
should not have LANG
in its key.
However, the project name uniquely determines the project manager. MGR
is
transitively dependent on NAME
, because NAME
determines a set of values
for PROJECT
, and PROJECT
functionally determines MGR
. To complete
our design, we should remove any transitive dependencies, according to the
relational rule that in every row of a table, all columns must depend directly
on the key, without any transitive dependencies through other columns. A second
normal form relation that has no transitive dependence is said to be in third
normal form.
Because each project has only one manager, we can form a PROJECTS
table
with the columns PROJECT
and MGR
. PROJECTS
’ key will be
PROJECT
. Note that MGR could also be a key, if each employee managed only
one project. Finally, since each employee works on one or more projects, we
will create a fourth table, called ASSIGNMENTS
, using the columns NAME
and PROJECT
. This table forms the association between the EMPLOYEES
and
PROJECTS
tables and is “all key” - i.e., it has no additional dependent
columns, because the only thing dependent on both NAME
and PROJECT
is
the fact that they are associated. Here are the third normal form tables.
|
|
|
|
Payroll |
Smith |
A/R |
Jones |
Inventory |
Norman |
Datcomm |
Harvey |
|
|
|
|
Marvin |
Payroll |
Marvin |
A/R |
Brown |
Inventory |
Norman |
Inventory |
James |
A/R |
James |
Datcomm |
Jones |
A/R |
At this point, our design is complete. All tables are in third normal form, and requests such as those listed earlier can easily be dealt with.
Here are some tips for good database design:
Don’t use an existing database as the basis for a new database structure – you don’t want to inadvertently duplicate awkward or inconsistent table definitions.
Make sure that each table represents just one subject – that is, either one object or one event. This avoids unnecessary duplication of data.
Define a primary key for every table – not only will it uniquely identify a row value, you’ll use it to join tables. A primary key should have these characteristics: its value must be unique and “known not nullable”, and its value should consist of the minimum number of columns to guarantee uniqueness.
Don’t define any multi-value columns – that is, don’t use the ROW or ARRAY <data type>s.
Implement data integrity; define unique keys and foreign keys for your tables.
The SQL-Environment¶
The SQL Standard says that all SQL operations are executed within an SQL-environment. An SQL-environment has six components.
One SQL-agent responsible for causing the execution of SQL statements. It is usually an application program that calls one or more externally-invoked procedures in an SQL-client Module.
One SQL-implementation; a database management system (DBMS) that executes SQL statements. Your SQL-agent considers your DBMS to have two components: (a) one SQL-client, to which the SQL-agent is bound, and (b) one or more SQL-servers to manage your SQL-data. (SQL-data consists of the descriptions of all the SQL Objects, plus all the data values you can access with your DBMS.) The SQL-client is the part of your DBMS that establishes connections to the SQL-servers; it maintains a diagnostics area and other state data that relate to the interactions between the DBMS and the SQL-agent. Each SQL- server has three responsibilities: (a) it manages the SQL-session taking place over the SQL-Connection between itself and the SQL-client, (b) it executes SQL statements received from the SQL-client, receiving and sending data as required, and (c) it maintains the state of the SQL-session, including the <AuthorizationID> and certain session defaults. The method of communication between the SQL-client and the SQL-server(s) is implementation-defined, but the Standard does specify how an SQL- agent will communicate with your DBMS ()*see* SQL Binding Styles).
Zero or more SQL-client Modules, each containing zero or more externally-invoked procedures. SQL-client Modules are programming modules – exactly one is associated with an SQL-agent at any time.
Zero or more <AuthorizationID>s. An SQL <AuthorizationID>, or authorization identifier, represents a user of SQL-data.
Zero or more Catalogs.
Zero or more sites (e.g., Base tables) that contain SQL-data.
In short, an SQL-environment can be thought of as a specific operation of a DBMS on the collection of Catalogs (that contain SQL-data) within a specified SQL-server by all the users (that is, all persons and programs) that have authority to access the SQL-server during the time the DBMS is operating.
SQL Objects¶
The SQL Standard describes the concepts on which SQL is based in terms of Objects, such as Tables. Each SQL Object is defined in terms of the characteristics (e.g., its name) that describe it – the Standard calls this the Object’s descriptor. Some Objects are dependent on other Objects, e.g., a Column is dependent on the Table it belongs to. If an Object is dropped (i.e., destroyed), then every Object dependent on it is also dropped.
Cluster¶
An SQL Cluster is the group of Catalogs available to an SQL-session at any point in time; that is, it contains all the SQL-data you may access through a given SQL-server. Clusters are created and dropped using implementation-defined methods. The Objects that belong to a Cluster are known as Cluster Objects; that is, they depend on some Cluster. Every Cluster Object has a name that must be unique (among Objects of its name class) within the Cluster it belongs to. The Cluster Object name classes are:
<AuthorizationID>s
Catalogs
<AuthorizationID>¶
An SQL <AuthorizationID> is a character string which identifies a user and the
set of Privileges belonging to that user. (A user is either an actual person or
an application program that has access to SQL-data.) An SQL Role is a set of
zero or more role authorizations. A role authorization allows a given
<AuthorizationID> to use every Privilege granted to that Role.
<AuthorizationID>s are dependent on some Cluster; they are created, dropped and
mapped to real users using implementation-defined methods. Roles are dependent
on some schema and are created and dropped with the CREATE ROLE
and DROP
ROLE
statements.
Privilege¶
An SQL Privilege authorizes a particular <AuthorizationID> to execute a given
operation – either DELETE
, EXECUTE
, INSERT
, REFERENCES
,
SELECT
, TRIGGER
, UNDER
, UPDATE
, or USAGE
on a given Schema
Object. It may also allow the grantee to pass the Privilege on to others.
Privileges, dependent on some <AuthorizationID>, are created and assigned to
<AuthorizationID>s with the GRANT
statement and are dropped and removed
with the REVOKE
statement.
Catalog¶
An SQL Catalog is a named group of Schemas, one of which must be an Ur-Schema
named INFORMATION_SCHEMA
. (The INFORMATION_SCHEMA
Schema is a set of
Views and Domains that contain the descriptions of all the SQL-data belonging
to that Catalog.) Catalogs are dependent on some Cluster and are created and
dropped using implementation-defined methods.
Schema¶
An SQL Schema is a named group of SQL-data that is owned by a particular <AuthorizationID>. Schemas are dependent on some Catalog and are created, altered, and dropped using the SQL-Schema statements. The Objects that may belong to a Schema are known as Schema Objects; that is, they depend on some Schema. Every Schema Object has a name that must be unique (among Objects of its name class) within the Schema it belongs to. The Schema Object name classes are:
Base tables and Views
Domains and UDTs
Constraints and Assertions
Character sets
Collations
Translations
Triggers
SQL-server Modules
SQL-invoked routines
Table¶
An SQL Table is a named set of rows – an ordered row of one or more <Column
name>s together with zero or more unordered rows of data values. Tables store
data about a specific entity; each row of the Table describes a single
occurrence of that entity. The SQL Standard defines three types of Tables: Base
tables, Views, and derived tables. Tables are dependent on some Schema or some
Module. Base tables are created, altered, and dropped with the CREATE
TABLE
, ALTER TABLE
, and DROP TABLE
statements, Views are created and
dropped with the CREATE VIEW
and DROP VIEW
statements, and derived
tables are created when you execute a query.
Column¶
An SQL Column is a named component of a Table – a set of similar data values
that describe the same attribute of an entity. A Column’s values all belong to
the same <data type>, or to the same Domain, and may vary over time. A Column
value is the smallest unit of data that can be selected from, or updated for, a
Table. Columns are dependent on some Table and are created, altered, and
dropped with Column definition clauses in the CREATE TABLE
and ALTER
TABLE
statements.
Domain and UDT¶
An SQL Domain and an SQL UDT, or user-defined type, are both named <data type>s
that identify a set of valid data values. Their characteristics are defined by
users and their purpose is to constrain the values that can be stored as
SQL-data. Domains are dependent on some Schema and are created, altered, and
dropped with the CREATE DOMAIN
, ALTER DOMAIN
and DROP DOMAIN
statements. UDTs are dependent on some Schema or some Module and are created.
altered, and dropped with the CREATE TYPE
, ALTER TYPE
, and DROP
TYPE
statements.
Constraint and Assertion¶
An SQL Constraint and an SQL Assertion are both named rules that identify sets
of valid data values. They constrain the allowable data values for Columns,
Domains, and Tables and are defined with two checking characteristics: a
deferral mode (either DEFERRABLE
or NOT DEFERRABLE
) and a constraint
check time (either DEFERRED
or IMMEDIATE
). Constraints are dependent on
some Table or some Domain and are created and dropped with Constraint clauses
in the CREATE TABLE
, ALTER TABLE
, CREATE DOMAIN
and ALTER
DOMAIN
statements. Assertions are dependent on some Schema and are created
and dropped with the CREATE ASSERTION
and DROP ASSERTION
statements.
Character Set¶
An SQL Character set is a named group of characters (the character repertoire)
combined with that repertoire’s Form-of-use, or coding scheme – the (usually
one-to-one) mapping scheme between each character in the repertoire and a set
of internal codes (usually 8-bit values) that give the characters’ order in the
repertoire and define how the characters are encoded as numbers. Every
Character set must contain a space character that is equivalent to the Unicode
character U+0020. Character sets are dependent on some Schema and are created
and dropped with the CREATE CHARACTER SET
and DROP CHARACTER SET
statements. Every Character set has a default Collation.
Collation¶
An SQL Collation is a named set of rules that describes a collating sequence.
Each Collation is defined for exactly one Character set and is used to
determine the results of comparisons between character strings based on that
Character set. All Character sets have a default Collation. Additional
Collations may also be created for any Character set, so for any character
string comparison, there are one or more Collations that may be invoked by the
COLLATE
function. Collations are dependent on some Schema and are created
and dropped with the CREATE COLLATION
and DROP COLLATION
statements.
Translation¶
An SQL Translation is a named set of rules that maps characters from a source
Character set to characters in a target Character set for conversion purposes.
For any pair of Character sets, there are zero or more Translations that may be
invoked by the TRANSLATE
function. Translations are dependent on some
Schema and are created and dropped with the CREATE TRANSLATION
and DROP
TRANSLATION
statements.
Trigger¶
An SQL Trigger is a named rule that is associated with a single Base table.
Each Trigger defines a trigger event specifying which action – either
INSERT
, DELETE
, or UPDATE
– on the Table will cause the triggered
actions, a trigger action time specifying whether the triggered action is to be
taken before or after the trigger event, and one or more triggered actions (the
action to take when the Trigger is fired, or invoked). Triggers are dependent
on some Schema and are created and dropped with the CREATE TRIGGER
and
DROP TRIGGER
statements.
Module¶
An SQL Module is an optionally-named group of SQL statements that is treated as
a unit of an application program. Such programs use SQL statements to carry out
database operations instead of routines written in the host language. There are
three kinds of SQL Modules: (a) an SQL-client Module contains SQL procedures
that are invoked by a host language and is defined with the MODULE statement,
(b) an SQL-session Module contains only SQL statements prepared in that
SQL-session and is usually an implicit Module (that is, its presence isn’t
obvious to the user), and (c) an SQL-server Module – the SQL/PSM type – is
dependent on some Schema, contains only SQL-invoked routines and is created,
altered, and dropped with the CREATE MODULE
, ALTER MODULE
, and DROP
MODULE
statements.
SQL-invoked Routine¶
An SQL-invoked routine is a function or a procedure that can be invoked from
SQL. An SQL-invoked function is invoked by a routine invocation in some value
expression, while an SQL-invoked procedure is a procedure invoked with the
CALL
statement. SQL- invoked routines are dependent either directly on some
Schema or on some Module and are created and dropped with the CREATE
PROCEDURE
, DECLARE PROCEDURE
, CREATE FUNCTION
, CREATE METHOD
,
DROP SPECIFIC ROUTINE
, DROP SPECIFIC FUNCTION
, and DROP SPECIFIC
PROCEDURE
statements.
SQL Data Types¶
Every data value belongs to some SQL <data type>. The logical representation of a data value is known as a <literal>. SQL supports three sorts of <data type>s – predefined <data type>s, constructed <data type>s, and <user-defined data type>s, or UDTs – all of which may be used to define a set of valid data values. The predefined <data type>s are all scalar types; they contain atomic values (i.e., values that are not composed of sets of values of other <data type>s). The constructed <data type>s are mostly composite types; they contain array values (i.e., values that are composed of sets of values, each of a declared predefined <data type>). The UDTs are composite types. Their values and attributes are totally user-defined.
Each host language supported by the SQL Standard has its own data types. These are distinct from SQL <data type>s, though they often have similar names. The Standard includes instructions on how to map SQL <data type>s to host language data types.
SQL-data values are either non-null values or the null value. The null value is a special implementation-dependent value that can be assigned to any SQL <data type>. It is used to represent “value unknown” or “value inapplicable” and is distinct from all non-null values. The null value is often denoted by the <keyword> NULL.
Predefined <data type>s¶
SQL’s predefined scalar <data type>s are identified by these <keyword>s:
INTEGER
, SMALLINT
, NUMERIC
, DECIMAL, ``FLOAT
, REAL
,
DOUBLE PRECISION
, BIT
, BIT VARYING
, BINARY LARGE OBJECT
,
CHARACTER
, CHARACTER VARYING
, NATIONAL CHARACTER
, NATIONAL
CHARACTER VARYING
, CHARACTER LARGE OBJECT
, NATIONAL CHARACTER LARGE
OBJECT
, DATE
, TIME
, TIME WITH TIME ZONE
, TIMESTAMP
,
TIMESTAMP WITH TIME ZONE
, INTERVAL
, and BOOLEAN
.
Number <data type>s¶
A numeric value is either an exact numeric (integer or decimal) number or an
approximate numeric (floating point) number. The numeric <data type>s
INTEGER
(or INT
), SMALLINT
, NUMERIC
, DECIMAL
(or DEC
),
FLOAT
, REAL
, and DOUBLE PRECISION
store numbers inserted in either
exact numeric form (e.g., 75, -6.2) or approximate numeric form (e.g.,
1.256E-4, -1.03E+5). INT
and SMALLINT
are exact numeric types with a
predefined precision and a scale of zero; NUMERIC
and DECIMAL
are exact
numeric types with definable precisions and scales; FLOAT
is an approximate
numeric type with a definable precision; and REAL
and DOUBLE PRECISION
are approximate numeric types with predefined precisions. All numbers are
mutually assignable and mutually comparable. Assignment and comparison are
performed in the familiar, algebraic manner. The following SQL operations
involve numbers: addition and unary plus, subtraction and unary minus,
multiplication, division, assignment, comparison, ABS
, BETWEEN
,
BIT_LENGTH
, CARDINALITY
, CHAR_LENGTH
, DISTINCT
, EXISTS
,
EXTRACT
, FOR ALL
, FOR SOME
, IN
, IS NULL
, MATCH
,
MOD
, OCTET_LENGTH
, POSITION
, and UNIQUE
.
Bit String <data types>s¶
A bit string value is any sequence of bits or hexits. The bit string <data
type>s BIT
and BIT VARYING
store bit string values inserted in either
binary form (any sequence of zero or more 0- bits or 1-bits) or hexadecimal
form (any sequence of zero or more 0-hexits, 1-hexits, 2-hexits, 3-hexits,
4-hexits, 5-hexits, 6- hexits, 7-hexits, 8-hexits, 9-hexits, A-hexits,
B-hexits, C- hexits, D-hexits, E-hexits, or F-hexits). BIT
has a definable
fixed length; BIT VARYING
has a definable variable length. All bit strings
are mutually assignable and mutually comparable. Assignment of a bit string is
performed bit-by-bit beginning with the source’s most significant bit. For
comparison purposes, a 0-bit is less than a 1-bit. The following SQL operations
involve bit strings: concatenation, assignment, comparison, BETWEEN
,
BIT_LENGTH
, CHAR_LENGTH
, DISTINCT
, EXISTS
, FOR ALL
, FOR
SOME
, IN
, IS NULL
, MATCH
, OCTET_LENGTH
, POSITION
,
SUBSTRING
, and UNIQUE
.
Binary String <data types>s¶
A binary string value is any sequence of octets that aren’t associated with a
Character set. The binary string <data type> BINARY LARGE OBJECT
(BLOB
)
stores binary string values inserted in hexadecimal form. BLOB
has a
definable variable length. All binary strings are mutually assignable and
mutually comparable. Assignment of a binary string is performed octet-by-octet
beginning with the source’s most significant octet. Comparison is supported
only for equality. The following SQL operations involve binary strings:
concatenation, assignment, comparison, BIT_LENGTH
, CHAR_LENGTH
,
EXISTS
, FOR ALL
, FOR SOME
, IS NULL
, LIKE
, OCTET_LENGTH
,
OVERLAY
, POSITION
, SUBSTRING
, and TRIM
.
Character String <data type>s¶
A character string value is any sequence of characters that belong to a given
Character set. The character string <data type>s CHARACTER
(CHAR
),
CHARACTER VARYING
(VARCHAR
), NATIONAL CHARACTER
(NCHAR
) and
NATIONAL CHARACTER VARYING
(NCHAR VARYING
) store character strings,
while the character string <data type>s CHARACTER LARGE OBJECT
(CLOB
)
and NATIONAL CHARACTER LARGE OBJECT
(NCLOB
) store large object
character strings. CHAR
and NCHAR
have a definable fixed length;
VARCHAR
, NCHAR VARYING
, CLOB
and NCLOB
have a definable
variable length. CHAR
, VARCHAR
, and CLOB
have a definable Character
set; NCHAR
, NCHAR VARYING
, and NCLOB
have a predefined Character
set. All of the character string <data type>s have a definable Collation or
collating sequence. All character strings that belong to the same Character set
are mutually assignable and mutually comparable if they have the same
Collation. Assignment of a character string is performed character-by-character
beginning with the source’s first character. The result of a character string
comparison is determined by the rules of the Collation used for the comparison.
The following SQL operations involve character strings: concatenation,
assignment, comparison, BETWEEN
, BIT_LENGTH
, CHAR_LENGTH
,
CONVERT
, DISTINCT
, EXISTS
, FOR ALL
, FOR SOME
, IN
, IS
NULL
, LIKE
, LOWER
, MATCH
, OCTET_LENGTH
, OVERLAY
,
POSITION
, SIMILAR
, SUBSTRING
, TRANSLATE
, TRIM
, UNIQUE
,
and UPPER
.
Temporal <data type>s¶
A temporal value is a date, a time, a timestamp, or an interval of time. The
temporal <data type> DATE stores dates, TIME
and TIME WITH TIME ZONE
store times, TIMESTAMP
and TIMESTAMP WITH TIME ZONE
store timestamps,
and INTERVAL
stores intervals. DATE
has a predefined precision;
TIME
, TIME WITH TIME ZONE
, TIMESTAMP
and TIMESTAMP WITH TIME
ZONE
have definable fractional seconds precisions. There are two classes of
INTERVAL
. The first, year-month intervals, has a definable precision that
includes some contiguous combination of the YEAR
and MONTH
datetime
fields. The second, day-time intervals, has a definable precision that includes
some contiguous combination of the DAY
, HOUR
, MINUTE
, and
SECOND
datetime fields. TIME WITH TIME ZONE
values are (and TIMESTAMP
WITH TIME ZONE
values include) times that are maintained in Universal
Coordinated Time (UTC) – with a portion of the value representing a time zone
offset. The time zone offset is an interval that specifies the difference
between UTC and the actual date and time in the value’s time zone. All temporal
values of the same type are mutually assignable and mutually comparable; the
results must follow the usual rules for temporal values according to the
Gregorian calendar and the 24-hour clock. The following SQL operations involve
temporal values: addition, subtraction, multiplication, division, assignment,
comparison, ABS
, BETWEEN
, CURRENT_DATE
, CURRENT_TIME
,
CURRENT_TIMESTAMP
, DISTINCT
, EXISTS
, EXTRACT
, FOR ALL
,
FOR SOME
, IN
, IS NULL
, LOCALTIME
, LOCALTIMESTAMP
,
MATCH
, OVERLAPS
, and UNIQUE
.
Boolean <data type>s¶
A boolean value is a truth value; either TRUE
, FALSE
, or UNKNOWN
.
(The truth value UNKNOWN
is sometimes represented by the null value.) The
boolean <data type> BOOLEAN
stores truth values. All truth values are
mutually assignable and mutually comparable. TRUE
and FALSE
may be
assigned to any boolean target; UNKNOWN
may only be assigned if the boolean
target allows NULLs
. For comparison purposes, TRUE
is greater than
FALSE
. The following SQL operations involve boolean values: AND
,
IS
, NOT
, OR
, and the results of any predicate or search condition.
Constructed <data types>¶
An SQL constructed <data type> is either a <reference type>, a <row type>, or a
<collection type>. A <reference type> is a scalar constructed <data type>
identified by the <keyword> REF
. A <row type> is a composite constructed
<data type> identified by the <keyword> ROW
. A <collection type> is a
composite constructed <data type>, identified by the <keyword> ARRAY
.
<reference type>s¶
A reference value points to some row of a referenceable Base table (that is, a
Base table that has a “with REF
- value” property).
<row type>s¶
A <row type> is a sequence of one or more (Field, <data type>) pairs. A value of a <row type> consists of one value for each of its Fields.
<collection type>s¶
A <collection type> is a composite data value that consists of zero or more elements of a specified <data type>, known as the element type – that is, in SQL3, a <collection type> is an array.
User-defined Types¶
The SQL user-defined types (UDTs) are Schema Objects that can be defined by a standard, by a DBMS, or by an SQL application. UDTs have no corresponding <literal>s.
Data Type Conversions¶
SQL allows for implicit <data type> conversion in expressions and in FETCH
,
‘`SELECT`, INSERT
, DELETE
, and UPDATE
operations. Explicit <data
type> conversions may be performed with the CAST
operator.
Sites¶
As defined in the SQL Standard, a site is “a place that holds an instance of
a value of a specified <data type>”. A site has a defined degree of persistence
– if it exists until deliberately destroyed, it is a persistent site; if it
ceases to exist at the end of an SQL statement, SQL transaction, or
SQL-session, it is a temporary site; if it exists only to hold an argument or
returned value, it is a transient site. The principal kind of persistent or
temporary site is a Base table. Some sites may be referenced by their names
(e.g., Base tables and SQL variables) or by a REF
value. A site occupied by
an element of an array may be referenced by its element number.
The instance at a site can be changed in two ways: by assignment or by mutation. Assignment is an operation that replaces the value at a site (the “target”) with a new value (the “source”). Mutation is an operation that changes the value of some attribute of an instance at a site whose <data type> is a UDT. Neither assignment nor mutation has any effect on the reference value of a site, if any.
Every site has a nullability characteristic, which indicates whether it may contain the null value (is “possibly nullable”) or not (is “known not nullable”). Only the Columns of Base tables may be constrained to be “known not nullable”, but the characteristic is inheritable.
Locators¶
An embedded host language variable, host parameter, SQL parameter, or external routine, or the value returned by an external function may all be specified to be a Locator. The purpose of a Locator is to allow very large data values to be operated on without transferring the entire value to and from your application program.
A Locator is not SQL-data; instead, it is an SQL-session Object that can be used to reference a specific value. The SQL Standard defines three types of Locators: the large object (LOB) Locator, the UDT Locator and the array Locator. A LOB Locator is either (a) a BLOB Locator (its value identifies a binary large object), (b) a CLOB Locator (its value identifies a character large object) or (c) an NCLOB Locator (its value identifies a national character large object). A UDT Locator identifies a value of a given user-defined type. An array Locator identifies a value of a given array.
SQL Language Elements¶
The SQL Standard has numerous rules for such basic issues as what makes a legal name and how to put together SQL syntax. The starting point for these rules is knowing what the basic scalar language elements are. The SQL basic scalar language elements are defined in the set of <SQL language character>s.
<SQL language character>¶
According to the SQL Standard, the syntactic element <SQL language character> defines “the terminal symbols of the SQL language and the elements of strings”. In other words, you’ll use <SQL language character>s to write SQL syntax or <token>s. <SQL language character>s are case insensitive; that is, uppercase and lowercase simple Latin letters are interchangeable so that, to an SQL parser, these three words are exactly alike:
SELECT
select
Select
The set of <SQL language character>s contains:
The uppercase simple Latin letters A to Z.
The lowercase simple Latin letters a to z.
The digits 0 to 9.
The set of <SQL special character>s.
<SQL special character>¶
The set of <SQL special character>s is part of the set of <SQL language character>s and contains:
-- The space character
( -- The left parenthesis
) -- The right parenthesis
" -- The double quote mark
' -- The single quote mark
% -- The percent sign
& -- The ampersand
* -- The asterisk or multiplication sign
/ -- The solidus or division sign
+ -- The plus sign
- -- The minus sign or dash
, -- The comma
. -- The period
: -- The colon
; -- The semicolon
< -- The less than operator
> -- The greater than operator
? -- The question mark
[ -- The left bracket
] -- The right bracket
_ -- The underline character
| -- The vertical bar
= -- The equals operator
{ -- The left brace
} -- The right brace
^ -- The circumflex
<token>¶
A <token> is either a <literal>, a <keyword>, an <identifier> or an <SQL
special character> or symbol – that is, a <token> is a group of characters
that is recognized as a single unit by an SQL parser. For example, there are a
total of 7 <token>s (SELECT
, “a”, “+”, “5”, FROM
, “t”, and “;”) in the
following SQL statement.
SELECT a+5 FROM t;
In SQL, <token>s are grouped into two types: <nondelimiter token>s and <delimiter token>s. The difference between them lies in the fact that, while any <token> may be followed by a <separator>, a <nondelimiter token> must be followed either by a <separator> or a <delimiter token>.
A <nondelimiter token> is an <unsigned numeric literal>, a <national character string literal>, a <bit string literal>, a <hex string literal>, a <keyword>, or a <regular identifier>. A <delimiter token> is a <character string literal>, a <date literal>, a <time literal>, a <timestamp literal>, an <interval literal>, a <delimited identifier>, an <SQL special character>, or one of these symbols:
<> -- The not equals operator
>= -- The greater than or equals operator
<= -- The less than or equals operator
|| -- The concatenation operator
??( -- The left trigraph
??) -- The right trigraph
-> -- The right arrow
=> -- The keyword parameter tag
For example, the <keyword> <token> SELECT
may be followed either by a
<separator> (usually a space) or by an <SQL special character>. Thus, both of
the following are examples of legal SQL syntax:
``SELECT`` column_1
is legal syntax because a space separates the <token> SELECT
from the
<token> “column_1”
SELECT*
is legal syntax because, although no space separates the <token> SELECT
from the <token> “*”, the asterisk is identified as a separate <token> because
it is a <SQL special character>.
A <token> may not include any <separator>s unless it is a <character string literal>, a <bit string literal>, a <hex string literal>, a <timestamp literal>, an <interval literal>, or a <delimited identifier>.
<separator>¶
[Obscure Rule] applies to this entire section.
Your SQL parser must know where one <token> ends and another begins. To do so, it recognizes white space, a newline character, a simple comment, and a bracketed comment as <separator>s.
White space is usually just one or more spaces, but it can also consist of any consecutive sequence of these Unicode characters:
U+0009 Horizontal Tab
U+000A Line Feed
U+000B Vertical Tabulation
U+000C Form Feed
U+000D Carriage Return
U+0020 Space
U+00A0 No-Break Space
U+2000 En Quad
U+2001 Em Quad
U+2002 En Space
U+2003 Em Space
U+2004 Three-Per-Em Space
U+2005 Four-Per-Em Space
U+2006 Six-Per-Em Space
U+2007 Figure Space
U+2008 Punctuation Space
U+2009 Thin Space
U+200A Hair Space
U+200B Zero Width Space
U+200C Zero Width Non-Joiner
U+200D Zero Width Joiner
U+200E Left-To-Right Mark
U+200F Right-To-Left Mark
U+3000 Ideographic Space
U+2028 Line Separator
U+2029 Paragraph Separator
U+FEFF Zero Width No-Break Space
[NON-PORTABLE] A newline character marks the end of a line. It is non-standard because the SQL Standard requires implementors to define which white space character(s) will be recognized as end- of-line indicators by their parsers. [OCELOT Implementation] The OCELOT DBMS that comes with this book recognizes carriage returns and line feeds as newline characters.
A simple comment begins with two or more consecutive dashes, contains any number of characters (including spaces and more dashes), and ends with a newline character. For example, these two SQL statements are both followed by a simple comment.
SELECT a+5 FROM t; -- this is a simple comment
SELECT a+5 FROM t; --- this is a simple comment too
A bracketed comment is a C-style comment. It begins with “/”, ends with “/” and contains any number of characters, including zero or more <separator>s. For example, this SQL statement is followed by a bracketed comment.
SELECT a+5 FROM t; /* this is a bracketed comment that contains a
carriage return */
If you want to restrict your code to Core SQL, don’t use bracketed comments.
<literal>¶
A <literal> is a <token> that represents a non-null data value. SQL values are
normally atomic – they cannot be subdivided – and are either non-null values
or the null value. The null value isn’t represented by a <literal>. Instead,
the <keyword> NULL
is used whenever it’s necessary to indicate that the
null value is represented.
[NON-PORTABLE] The logical representation of the null value is non-standard because the SQL Standard requires implementors to define that character used to display the null value. [OCELOT Implementation] The OCELOT DBMS that comes with this book displays a question mark to represent the null value.
In SQL, a <literal> is either a signed <numeric literal> (for example: +52.6), an unsigned <numeric literal> (for example: 15) or a general literal. (An unsigned literal is thus either an unsigned <numeric literal> or a general literal. A general literal is one of the following:
A <bit string literal>, for example,
B'1011'
A <hex string literal>, for example,
X'4A'
A <binary string literal>, for example,
X'44AF'
A <character string literal>, for example,
'hello'
A <national character string literal>, for example,
N'hello'
A <date literal>, for example,
DATE '1997-07-15'
A <time literal>, for example,
TIME
'19:30:20'
TIME
'19:30:20.05'
TIME
'19:30:20+03:00'
A <timestamp literal>, for example,
TIMESTAMP
'1997-07-15 19:30:20'
TIMESTAMP
'1997-07-15 19:30:20.05'`
TIMESTAMP
`'1997-07-15 19:30:20.05-10:30'
A <year-month literal>, for example,
INTERVAL
'20' YEAR
INTERVAL
'10' MONTH
INTERVAL
'20-10' YEAR TO MONTH
A <day-time literal>, for example,
INTERVAL
'20'
DAY
INTERVAL
'-10'
HOUR
INTERVAL
'15'
MINUTE
INTERVAL
'10'
SECOND
INTERVAL
'20
10:15:10' DAY TO SECOND
A <boolean literal>, either
TRUE
,FALSE
orUNKNOWN
.
<keyword>¶
A <keyword> is a word that has a special meaning for the SQL parser. There are two types of SQL <keyword>s: reserved <keyword>s and non-reserved <keyword>s. Reserved <keyword>s may not be used as <regular identifier>s. Non-reserved <keyword>s are not so restricted, but it’s probably not a good idea to use them as <regular identifier>s anyway.
A <keyword> is case insensitive because all its characters are part of the set of <SQL language character>s. That is, uppercase and lowercase letters within a <keyword> are interchangeable; so that, for example, these three <keyword>s are exactly alike to an SQL parser:
SELECT
select
Select
The Set of Reserved <keyword>s¶
ABSOLUTE CASCADED CURRENT_ROLE DO
ACTION CASE CURRENT_TIME DOMAIN
ADD CAST CURRENT_TIMESTAMP DOUBLE
ADMIN CATALOG CURRENT_USER DROP
AFTER CHAR CURSOR DYNAMIC
AGGREGATE CHARACTER CYCLE EACH
ALIAS CHECK DATA ELSE
ALL CLASS DATALINK ELSEIF
ALLOCATE CLOB DATE END
ALTER CLOSE DAY END-EXECUTE
AND COLLATE DEALLOCATE EQUALS
ANY COLLATION DEC ESCAPE
ARE COLUMN DECIMAL EVERY
ARRAY COMMIT DECLARE EXCEPT
AS COMPLETION DEFAULT EXCEPTION
ASC CONDITION DEFERRABLE EXEC
ASSERTION CONNECT DEFERRED EXECUTE
AT CONNECTION DELETE EXIT
AUTHORIZATION CONSTRAINT DEPTH EXPAND
BEFORE CONSTRAINTS DEREF EXPANDING
BEGIN CONSTRUCTOR DESC EXTERNAL
BINARY CONTAINS DESCRIBE FALSE
BIT CONTINUE DESCRIPTOR FETCH
BLOB CORRESPONDING DESTROY FIRST
BOOLEAN CREATE DESTRUCTOR FLOAT
BOTH CROSS DETERMINISTIC FOR
BREADTH CUBE DICTONARY FOREIGN
BY CURRENT DIAGNOSTICS FOUND
CALL CURRENT_DATE DISCONNECT FROM
CASCADE CURRENT_PATH DISTINCT FREE
FULL LANGUAGE OF REF
FUNCTION LARGE OFF REFERENCES
GENERAL LAST OLD REFERENCING
GET LATERAL ON RELATIVE
GLOBAL LEADING ONLY REPEAT
GO LEAVE OPEN RESIGNAL
GOTO LEFT OPERATION RESTRICT
GRANT LESS OPTION RESULT
GROUP LEVEL OR RETURN
GROUPING LIKE ORDER RETURNS
HANDLER LIMIT ORDINALITY REVOKE
HAVING LOCAL OUT RIGHT
HASH LOCALTIME OUTER ROLE
HOST LOCALTIMESTAMP OUTPUT ROLLBACK
HOUR LOCATOR PAD ROLLUP
IDENTITY LOOP PARAMETER ROUTINE
IF MATCH PARAMETERS ROW
IGNORE MEETS PARTIAL ROWS
IMMEDIATE MINUTE PATH SAVEPOINT
IN MODIFIES PERIOD SCHEMA
INDICATOR MODIFY POSTFIX SCROLL
INITIALIZE MODULE PRECEDES SEARCH
INITIALLY MONTH PRECISION SECOND
INNER NAMES PREFIX SECTION
INOUT NATIONAL PREORDER SELECT
INPUT NATURAL PREPARE SEQUENCE
INSERT NCHAR PRESERVE SESSION
INT NCLOB PRIMARY SESSION_USER
INTEGER NEW PRIOR SET
INTERSECT NEXT PRIVILEGES SETS
INTERVAL NO PROCEDURE SIGNAL
INTO NONE PUBLIC SIZE
IS NORMALIZE READ SMALLINT
ISOLATION NOT READS SOME
ITERATE NULL REAL SPACE
JOIN NUMERIC RECURSIVE SPECIFIC
KEY OBJECT REDO SPECIFICTYPE
SQL THAN UNDER VARIABLE
SQLEXCEPTION THEN UNDO VARYING
SQLSTATE TIME UNION VIEW
SQLWARNING TIMESTAMP UNIQUE WHEN
START TIMEZONE_HOUR UNKNOWN WHENEVER
STATE TIMEZONE_MINUTE UNTIL WHERE
STATIC TO UPDATE WHILE
STRUCTURE TRAILING USAGE WITH
SUCCEEDS TRANSACTION USER WITHOUT
SYSTEM_USER TRANSLATION USING WORK
TABLE TREAT VALUE WRITE
TEMPORARY TRIGGER VALUES YEAR
TERMINATE TRUE VARCHAR ZONE
Note
SQL-92 and SQL3 both added a considerable number of words to the set of SQL
reserved <keyword>s. The Standard acknowledges this and – as an aid to
users – suggests that you include either a digit or an underline character
in your <regular identifier>s, and avoid names that begin with
CURRENT_
, SESSION_
, SYSTEM_
or TIMEZONE_
and those that end
with _LENGTH
, to avoid conflicts with reserved <keyword>s added in
future revisions.
The Set of Non-Reserved <keyword>s¶
ABS CHARACTER_LENGTH CONDITION_NUMBER
ADA CHARACTER_SET_CATALOG CONNECTION_NAME
ASENSITIVE CHARACTER_SET_NAME CONSTRAINT_CATALOG
ASSIGNMENT CHARACTER_SET_SCHEMA CONSTRAINT_NAME
ASYMMETRIC CHECKED CONSTRAINT_SCHEMA
ATOMIC CLASS_ORGIN CONTAINS
AVG COALESCE CONTROL
BETWEEN COBOL CONVERT
BIT_LENGTH COLLATION_CATALOG COUNT
BITVAR COLLATION_NAME CURSOR_NAME
BLOCKED COLLATION_SCHEMA DATETIME_INTERVAL_CODE
C COLUMN_NAME DATETIME_INTERVAL_PRECISION
CARDINALITY COMMAND_FUNCTION DB
CATALOG_NAME COMMAND_FUNCTION_CODE DISPATCH
CHAIN COMMITTED DLCOMMENT
CHAR_LENGTH CONCATENATE DLFILESIZE
DLFILESIZEEXACT NULLABLE SERVER_NAME
DLLINKTYPE NUMBER SIMPLE
DLURLCOMPLETE NULLIF SOURCE
DLURLPATH OCTET_LENGTH SPECIFIC_NAME
DLURLPATHONLY OPTION SIMILAR
DLURLSCHEMA OVERLAPS STRUCTURE
DLURLSERVER OVERLAY SUBLIST
DLVALUE OVERRIDING SUBSTRING
DYNAMIC_FUNCTION PASCAL SUM
DYNAMIC_FUNCTION_CODE PARAMETER_MODE STYLE
EXISTING PARAMETER_ORDINAL_POSITION SUBCLASS_ORIGIN
EXISTS PARAMETER_SPECIFIC_CATALOG SYMMETRIC
EXTRACT PARAMETER_SPECIFIC_NAME SYSTEM
FILE PARAMETER_SPECIFIC_SCHEMA TABLE_NAME
FINAL PERMISSION TRANSACTIONS_COMMITTED
FORTRAN PLI TRANSACTIONS_ROLLED_BACK
GENERATED POSITION TRANSACTION_ACTIVE
HOLD RECOVERY TRANSFORM
INFIX REPEATABLE TRANSLATE
INSENSITIVE RESTORE TRIGGER_CATALOG
INSTANTIABLE RETURNED_LENGTH TRIGGER_SCHEMA
INTEGRITY RETURNED_OCTET_LENGTH TRIGGER_NAME
KEY_MEMBER RETURNED_SQLSTATE TRIM
KEY_TYPE ROUTINE_CATALOG TYPE
LENGTH ROUTINE_NAME UNCOMMITTED
LINK ROUTINE_SCHEMA UNLINK
LOWER ROW_COUNT UNNAMED
MAX ROW_TYPE_CATALOG UPPER
MIN ROW_TYPE_SCHEMA USER_DEFINED_TYPE_CATALOG
MESSAGE_LENGTH ROW_TYPE_NAME USER_DEFINED_TYPE_NAME
MESSAGE_OCTET_LENGTH SCALE USER_DEFINED_TYPE_SCHEMA
MESSAGE_TEXT SCHEMA_NAME YES
METHOD SELECTIVE
MOD SELF
MORE SENSITIVE
MUMPS SERIALIZABLE
NAME
The SQL Standard allows implementations to define more reserved words for their own DBMSs. Here are some words that are reserved in some dialect of one of the major vendors (e.g., Oracle, Sybase, Microsoft). You may be able to use these words as <regular identifier>s, but if you do so, you will lose portability.
ABORT DICTONARY NUMBER ROWNUN
ACCEPT DIGITS NUMBER_BASE ROWTYPE
ANALYZE DISPLACEMENT OFF RUN
ARCHIVELOG DISPOSE OID SEPERATE
ARRAY ELEMENT OLD_TABLE SEQUENCE
ASSIGN ENTRY OPERATOR SQLCA
ASYNCH EXCEPTION_INIT OPERATORS SQLCODE
ATTRIBUTES FACTOR OTHERS SQLERRM
AUDIT FORM PACKAGE SQLWARNING
BACKUP FREELISTS PARTITION STATEMENT
BINARY_INTEGER GENERTIC PCTFREE STDDEV
BODY IDENTIFIED PENDANT SUBTYPE
CACHE IGNORE POSITIVE SYMBOL
CHAR_BASE INCLUDE PRAGMA TABAUTH
CLUSTER INDEX PREORDERED TABLES
CLUSTERS INDEXES PRIVATE TASK
COLAUTH INFILE PROTECTED TERM
COLUMNS INSTEAD RAISE TEST
COMPRESS INSTANCE RANGE THERE
CONSTANT LIMITED RAW TUPLE
CRASH LIST RECORD USE
CURVAL MAXEXTENTS RELEASE VARCHAR2
DATA_BASE MINUS REM VARIANCE
DATABASE MLSLABEL RENAME VIEWS
DBA MODE REPLACE VIRTUAL
DEBUGOFF NEW RESOURCE VISIBLE
DEBUGON NEW_TABLE REUSE WAIT
DEFINITION NEXTVAL REVERSE XOR
DELAY NOCOMPRESS ROWID
DELTA NONE ROWLABLE
<identifier>¶
An <identifier> (a <token> that names an SQL Object) is a character string, up
to 128 characters long, from one Character set. Within a CREATE SCHEMA
statement, an <identifier> that doesn’t include an explicit <Schema name> names
an Object that belongs to the Schema you’re creating. In any other SQL
statement, an <identifier> that doesn’t include an explicit <Schema name> names
an Object that belongs to the Schema named in the SCHEMA
clause (or, if
there is no SCHEMA
clause, in the AUTHORIZATION
clause) of the
MODULE
statement that defines the Module you’re running. SQL recognizes
three types of <identifier>s: the <regular identifier>, the <SQL language
identifier>, and the <delimited identifier>.
<regular identifier>¶
The required syntax for a <regular identifier> is:
<regular identifier> ::=
Object name
A <regular identifier> is a character string, up to 128 characters long, that consists only of letters, digits, and underscore characters. It must begin with a letter.
[Obscure Rule] We usually think of a “letter” as one of the simple Latin letters, but in fact – depending on the Character set being used – a “letter” can also be an accented character, a character in a non-Latin alphabet, or a syllable or ideograph; i.e., it can be any character with the Unicode alphabetic property or ideographic property. The “letter” that begins a <regular identifier> may not have the Unicode combining property; the letters following it may, with the proviso that these characters are not legal anywhere in a <regular identifier>:
U+06DD Arabic End of Ayah
U+06DE Arabic Start of Rub El Hizb
U+20DD Combining Enclosing Circle
U+20DE Combining Enclosing Square
U+20DF Combining Enclosing Diamond
U+20E0 Combining Enclosing Circle Backslash
Depending on the Character set in use, you may also use these characters in a <regular identifier>, as long as they’re not used as the <identifier>’s first character:
U+00B7 |
Middle Dot |
U+02D0 |
Modifier Letter Triangular Colon |
U+20D1 |
Modifier Letter Half Triangular Colon |
U+0640 |
Arabic Tatweel |
U+0E46 |
Thai Character Maiyamok |
U+0EC6 |
Lao Ko La |
U+3005 |
Ideographic Iteration Mark |
U+3031 to |
|
U+3035 inclusive |
variations of Vertical Kana Repeat Mark |
U+309B to |
variations of Combining Katakana-Hiragana Sound Mark and |
U+309E inclusive |
Hiragana Iteration Mark |
U+30FC to |
variations of Katakana-Hiragana Prolonged Sound Mark and |
U+30FE inclusive |
Katakana Iteration Mark |
U+FF70 |
Halfwidth Katakana-Hiragana Prolonged Sound Mark |
U+FF9E |
Halfwidth Katakana Voiced Sound Mark |
U+FF9F |
Halfwidth Katakana Semi-voiced Sound Mark |
U+200C |
Zero Width Non-Joiner |
U+200D |
Zero Width Joiner |
U+200E |
Left-To-Right Mark |
U+200F |
Right-To-Left Mark |
U+202A |
Left-To-Right Embedding |
U+202B |
Right-To-Left Embedding |
U+202C |
Pop Directional Formatting |
U+202D |
Left-To-Right Override |
U+202E |
Right-To-Left Override. |
U+206A |
Inhibit Symmetric Swapping |
U+206B |
Activate Symmetric Swapping |
U+206C |
Inhibit Arabic Form Shaping |
U+206D |
Activate Arabic Form Shaping |
U+206E |
National Digit Shapes |
U+206F |
Nominal Digit Shapes |
U+FEFF |
Zero-Width No-Break Space |
U+203F |
Undertie |
U+2040 |
Character Tie |
U+FE33 |
Presentation Form for Vertical Low Line |
U+FE34 |
Presentation Form for Vertical Wavy Low Line |
U+FE4D |
Dashed Low Line |
U+FE4E |
Centreline Low Line |
U+FE4F |
Wavy Low Line |
U+FF3F |
Fullwidth Low Line |
A <regular identifier> is case insensitive. That is, uppercase and lowercase letters within a <regular identifier> are interchangeable; for example, these three <regular identifier>s are exactly alike to an SQL parser:
SAMS_TABLE
sams_table
Sams_Table
SQL doesn’t allow a reserved <keyword> to be used as a <regular identifier>.
When comparing a <regular identifier> and a reserved <keyword> to check for
equality, your DBMS will replace the lowercase letters in each with their
uppercase equivalents and assume that both belong to the SQL_``TEXT`` Character
set. In fact, your DBMS will replace all lowercase letters in a <regular
identifier> with their uppercase equivalents prior to any comparison and prior
to storing the <identifier> either in a Catalog’s INFORMATION_SCHEMA
or a
diagnostics area.
Here are some examples of <regular identifier>s:
|
a <regular identifier> |
|
another <regular identifier> |
|
a <regular identifier> that looks like a reserved <keyword> |
|
a <regular identifier> that doesn’t exclusively use simple Latin letters |
If you want to restrict your code to Core SQL, make sure your <regular identifier>s are no more than 18 characters long.
<SQL language identifier>¶
The required syntax for an <SQL language identifier> is:
<SQL language identifier> ::=
Object name
An <SQL language identifier> is a <regular identifier> that consists only of simple Latin letters, digits, and underscore characters. It must begin with a simple Latin letter. Here are two examples of <SQL language identifier>s:
TABLE_1
BOB_SCHEMA
<delimited identifier>¶
The required syntax for a <delimited identifier> is:
<delimited identifier> ::=
"Object name"
A <delimited identifier> is a character string, up to 128 characters long, surrounded by a pair of double quote marks. (The delimiting double quotes aren’t part of the <identifier>, so they’re not included in the calculation of its size.) Two consecutive double quotes within the character string (i.e., “”) represent one double quote mark; together, they count as one character when calculating the size of the <identifier>.
A <delimited identifier> is case sensitive. That is, uppercase and lowercase letters within a <delimited identifier> are not interchangeable; for example, to an SQL parser, these three <delimited identifier>s
"SAMS_TABLE"
"sams_table"
"Sams_Table"
represent three different names. Your DBMS will not replace lowercase letters in a <delimited identifier> with their uppercase equivalents prior to any comparison or storage operation.
Here are some examples of <delimited identifier>s:
|
a <delimited identifier> that uses lowercase letters and a special character |
|
a <delimited identifier> that includes spaces |
|
a <delimited identifier> that looks like a reserved <keyword> |
If you want to restrict your code to Core SQL, make sure your <delimited identifier>s are no more than 18 characters long.
<identifier> Equivalence¶
Two <regular identifier>s are the same if they consist of the same characters.
Your DBMS assumes the relevant Character set is SQL_TEXT
when comparing
them.
A <regular identifier> and a <delimited identifier> are the same if the
<regular identifier> consists of the same characters that make up the body
(i.e., the string of characters inside the double quote marks) of the
<delimited identifier>. Two <delimited identifier>s are the same if their
bodies consist of the same characters. Your DBMS assumes the relevant Character
set is SQL_TEXT
with a case sensitive Collation when comparing <regular
identifier>s to <delimited identifier>s and <delimited identifier>s to one
another.
Because of the difference in case sensitivity between <regular identifier>s and <delimited identifier>s, these two <regular identifier>s are the same:
P_TABLE
p_table
and both are equal to this <delimited identifier>:
"P_TABLE"
but neither are equal to this <delimited identifier>:
"p_table"
For another example, consider this group of <identifier>s:
|
A <delimited (uppercase) identifier>. |
|
A <delimited (lowercase) identifier>. |
|
A <regular identifier>. |
|
A <regular identifier>. |
|
A <regular identifier>. |
Because delimiting double quotes are not themselves part of an <identifier>,
the <delimited identifier> "E"
is the same as the <regular identifier>
E
, i.e., examples #1 and #3 are the same name. Because lowercase letters in
a <regular identifier> are mapped to uppercase letters before comparison and
storage, examples #3 and #4 are the same name – and they’re also the same name
as example #1. Because lowercase letters in a <delimited identifier> are not
mapped to uppercase letters at any time, example #2 is not the same name as
example #4. Because there is no mapping of accented characters in an
<identifier>, example #5 is not the same name as any of the others – but ë is
a letter, and so qualifies as a <regular identifier>. (This example assumes
that the MS-Windows encoding scheme – the one that Microsoft calls “ANSI” –
is in use. This is not always the case; the choice of possible Character sets
is broad.)
Qualification of <identifier>s¶
All SQL Objects have names which are some combination of <regular identifier>s, <delimited identifier>s, or <SQL language identifier>s in an appropriate hierarchy of qualification. The top of the hierarchy is [SQL-server name.], an implicit name, therefore never specified. Then comes [<Catalog name>.], which is the first level of the hierarchy that can be explicitly stated. The next level is [<Schema name>.], then comes [the name of an Object], and (if the Object is a Table) the final level of the hierarchy is <.Column name>. The entire qualification hierarchy always exists but is not necessarily visible; the Standard contains rules by which high-level parts of the combination may be omitted and their values assumed by default.