Data in CrateDB is stored in tables. Tables are organised into columns which have different data types depending on what data will be stored in them. In this video, we'll look at some of the primitive data types built into CrateDB.
Primitive types are used when defining columns that store scalar values such as numbers and strings. If you're familiar with SQL, you should recognise most of the data types described here.
We'll begin with the data types used for storing numerical data. CrateDB uses these data types to represent integer values. A SMALLINT can store 2 byte numbers in the range -32,768 to 32,767. Use the INTEGER type for integers that could be represented by 4 bytes and BIGINT for those that need an 8 byte range. When storing floating point numbers, choose REAL for single precision or DOUBLE PRECISION for more accuracy.
Character or string data can be represented with these data types. TEXT and VARCHAR represent variable length strings. To specify a string with a maximum length, use VARCHAR and specify the length shown as N here. Strings longer than this length will be truncated upon insertion. Use the CHARACTER type with an optional length value to represent fixed length blank padded strings. All of these data types support the storage of string data containing Unicode characters.
Date and time data is stored using the TIMESTAMP data type. A timestamp represents a specific date and time as a number of milliseconds since the UNIX epoch. There are two variants of this data type. TIMESTAMP WITHOUT TIME ZONE assumes all values are in UTC time. Use TIMESTAMP WITH TIME ZONE when you want to specify a time zone offset from UTC. When inserting data into these columns, it can be expressed as string literals or as UNIX timestamps supporting millisecond precision.
Finally, CrateDB has some other useful primitive data types. Use BOOLEAN to represent true or false data. The IP data type represents IP addresses and supports both V4 and V6 representations. The BIT data type stores bit sequences, useful for storing bit maps.
Here's a fragment of the CREATE TABLE statement used to define the schema for our taxi rides data in the Chicago data set, and here's some sample data for a taxi ride record. Trip IDs are variable length and contain alphanumeric data. They're stored in a TEXT column. For the numerical vehicle IDs, we're using an INTEGER representation. Start and end times for the ride are represented as UTC TIMESTAMPs, with the data here being inserted in UNIX timestamp format.
Chicago is divided into 77 community areas numbered 1 to 77. As we know the range of possible values when designing the schema, we've chosen SMALLINT for these columns. Each taxi ride has latitude and longitude values for the area that it started and ended in. Here, we're using DOUBLE PRECISION to represent these floating point numbers. CrateDB has specialised data types that handle geospatial data, and we'll explore those in a separate video.
In this video, we introduce some of CrateDB's primitive data types. To learn more about these and others, including aliases provided for compatibility with Postgres, please visit the CrateDB documentation website at cratedb.com/docs.