Query CrateDB

Note

This page documents the CrateDB client, implementing the Python Database API Specification v2.0 (PEP 249).

For help using the SQLAlchemy dialect, consult the SQLAlchemy dialect documentation.

See also

Supplementary information about the CrateDB Database API client can be found in the data types appendix.

Table of contents

Using a cursor

After connecting to CrateDB, you can execute queries via a database cursor.

Open a cursor like so:

>>> cursor = connection.cursor()

Inserting data

Regular inserts

Regular inserts are possible with the execute() method, like so:

>>> cursor.execute(
...     "INSERT INTO locations (name, date, kind, position) VALUES (?, ?, ?, ?)",
...     ("Einstein Cross", "2007-03-11", "Quasar", 7))

Here, the values of the tuple (the second argument) are safely interpolated into the query string (the first argument) where the ? characters appear, in the order they appear.

Warning

Never use string concatenation to build query strings.

Always use the parameter interpolation feature of the client library to guard against malicious input, as demonstrated in the example above.

Bulk inserts

Bulk inserts are possible with the executemany() method, which takes a list of tuples to insert:

>>> cursor.executemany(
...     "INSERT INTO locations (name, date, kind, position) VALUES (?, ?, ?, ?)",
...     [('Cloverleaf', '2007-03-11', 'Quasar', 7),
...      ('Old Faithful', '2007-03-11', 'Quasar', 7)])
[{'rowcount': 1}, {'rowcount': 1}]

The executemany() method returns a result dictionary for every tuple. This dictionary always has a rowcount key, indicating how many rows were inserted. If an error occurs, the rowcount value is -2, and the dictionary may additionally have an error_message key.

Selecting data

Executing a query

Selects can be performed with the execute() method, like so:

>>> cursor.execute("SELECT name FROM locations WHERE name = ?", ("Algol",))

Like with inserts, here, the single value of the tuple (the second argument) is safely interpolated into the query string (the first argument) where the ? character appears.

Warning

As with inserts, always use parameter interpolation.

After executing a query, you can fetch the results using one of three fetch methods, detailed below.

Fetching results

fetchone()

After executing a query, a fetchone() call on the cursor returns an list representing the next row from the result set:

>>> result = cursor.fetchone()
['Algol']

You can call fetchone() multiple times to return multiple rows.

If no more rows are available, None is returned.

Tip

The cursor object is an iterator, and the fetchone() method is an alias for next().

fetchmany()

After executing a query, a fetch_many() call with a numeric argument returns the specified number of result rows:

>>> cursor.execute("SELECT name FROM locations order by name")
>>> result = cursor.fetchmany(2)
>>> pprint(result)
[['Aldebaran'], ['Algol']]

If a number is not given as an argument, fetch_many() will return a result list with one result row:

>>> cursor.fetchmany()
[['Allosimanius Syneca']]

fetchall()

After executing a query, a fetchall() call on the cursor returns all remaining rows:

>>> cursor.execute("SELECT name FROM locations ORDER BY name")
>>> cursor.fetchall()
[['Aldebaran'],
 ['Algol'],
 ['Allosimanius Syneca'],
...
 ['Old Faithful'],
 ['Outer Eastern Rim']]

Accessing column names

Result rows are lists, not dictionaries. Which means that they do use contain column names for keys. If you want to access column names, you must use cursor.description.

The Python Database API Specification v2.0 defines seven description attributes per column, but only the first one (column name) is supported by this library. All other attributes are None.

Let’s say you have a query like this:

>>> cursor.execute("SELECT * FROM locations ORDER BY name")
>>> cursor.fetchone()
[1373932800000,
 None,
 'Max Quordlepleen claims that the only thing left ...',
...
 None,
 1]

The cursor description might look like this:

>>> cursor.description
(('date', None, None, None, None, None, None),
 ('datetime_tz', None, None, None, None, None, None),
 ('datetime_notz', None, None, None, None, None, None),
 ('description', None, None, None, None, None, None),
...
 ('nullable_datetime', None, None, None, None, None, None),
 ('position', None, None, None, None, None, None))

You can turn this into something more manageable with List Comprehensions:

>>> [column[0] for column in cursor.description]
['date', 'datetime_tz', 'datetime_notz', ..., 'nullable_datetime', 'position']

Data type conversion

The cursor object can optionally convert database types to native Python data types. There is a default implementation for the CrateDB data types IP and TIMESTAMP on behalf of the DefaultTypeConverter.

>>> from crate.client.converter import DefaultTypeConverter
>>> from crate.client.cursor import Cursor
>>> cursor = connection.cursor(converter=DefaultTypeConverter())
>>> cursor.execute("SELECT datetime_tz, datetime_notz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 18, 18, 10, 36, 758000), datetime.datetime(2022, 7, 18, 18, 10, 36, 758000)]

Custom data type conversion

By providing a custom converter instance, you can define your own data type conversions. For investigating the list of available data types, please either inspect the DataType enum, or the documentation about the list of available CrateDB data type identifiers for the HTTP interface.

This example creates and applies a simple custom converter for converging CrateDB’s BOOLEAN type to Python’s str type. It is using a simple converter function defined as lambda, which assigns yes for boolean True, and no otherwise.

>>> from crate.client.converter import Converter, DataType
>>> converter = Converter()
>>> converter.set(DataType.BOOLEAN, lambda value: value is True and "yes" or "no")
>>> cursor = connection.cursor(converter=converter)
>>> cursor.execute("SELECT flag FROM locations ORDER BY name")
>>> cursor.fetchone()
['no']

TIMESTAMP conversion with time zone

Based on the data type converter functionality, the driver offers a convenient interface to make it return timezone-aware datetime objects, using the desired time zone.

For your reference, in the following examples, epoch 1658167836758 is Mon, 18 Jul 2022 18:10:36 GMT.

>>> import datetime
>>> tz_mst = datetime.timezone(datetime.timedelta(hours=7), name="MST")
>>> cursor = connection.cursor(time_zone=tz_mst)
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 19, 1, 10, 36, 758000, tzinfo=datetime.timezone(datetime.timedelta(seconds=25200), 'MST'))]

For the time_zone keyword argument, different data types are supported. The available options are:

  • datetime.timezone.utc

  • datetime.timezone(datetime.timedelta(hours=7), name="MST")

  • pytz.timezone("Australia/Sydney")

  • zoneinfo.ZoneInfo("Australia/Sydney")

  • +0530 (UTC offset in string format)

Let’s exercise all of them.

>>> cursor.time_zone = datetime.timezone.utc
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 18, 18, 10, 36, 758000, tzinfo=datetime.timezone.utc)]
>>> import pytz
>>> cursor.time_zone = pytz.timezone("Australia/Sydney")
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
['foo', datetime.datetime(2022, 7, 19, 4, 10, 36, 758000, tzinfo=<DstTzInfo 'Australia/Sydney' AEST+10:00:00 STD>)]
>>> try:
...     import zoneinfo
... except ImportError:
...     from backports import zoneinfo
>>> cursor.time_zone = zoneinfo.ZoneInfo("Australia/Sydney")
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 19, 4, 10, 36, 758000, tzinfo=zoneinfo.ZoneInfo(key='Australia/Sydney'))]
>>> cursor.time_zone = "+0530"
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 18, 23, 40, 36, 758000, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800), '+0530'))]