Query CrateDB¶
Note
This page documents the CrateDB client, implementing the Python Database API Specification v2.0 (PEP 249).
For help using the SQLAlchemy dialect, consult the SQLAlchemy dialect documentation.
See also
Supplementary information about the CrateDB Database API client can be found in the data types appendix.
Table of contents
Using a cursor¶
After connecting to CrateDB, you can execute queries via a database cursor.
Open a cursor like so:
>>> cursor = connection.cursor()
Inserting data¶
Regular inserts¶
Regular inserts are possible with the execute()
method, like so:
>>> cursor.execute(
... "INSERT INTO locations (name, date, kind, position) VALUES (?, ?, ?, ?)",
... ("Einstein Cross", "2007-03-11", "Quasar", 7))
Here, the values of the tuple
(the second argument) are safely
interpolated into the query string (the first argument) where the ?
characters appear, in the order they appear.
Warning
Never use string concatenation to build query strings.
Always use the parameter interpolation feature of the client library to guard against malicious input, as demonstrated in the example above.
Bulk inserts¶
Bulk inserts are possible with the
executemany()
method, which takes a list
of tuples to insert:
>>> cursor.executemany(
... "INSERT INTO locations (name, date, kind, position) VALUES (?, ?, ?, ?)",
... [('Cloverleaf', '2007-03-11', 'Quasar', 7),
... ('Old Faithful', '2007-03-11', 'Quasar', 7)])
[{'rowcount': 1}, {'rowcount': 1}]
The executemany()
method returns a result dictionary
for every tuple. This dictionary always has a rowcount
key, indicating
how many rows were inserted. If an error occurs, the rowcount
value is
-2
, and the dictionary may additionally have an error_message
key.
Selecting data¶
Executing a query¶
Selects can be performed with the execute()
method, like so:
>>> cursor.execute("SELECT name FROM locations WHERE name = ?", ("Algol",))
Like with inserts, here, the single value of the tuple (the
second argument) is safely interpolated into the query string (the first
argument) where the ?
character appears.
Warning
As with inserts, always use parameter interpolation.
After executing a query, you can fetch the results using one of three fetch methods, detailed below.
Fetching results¶
fetchone()
¶
After executing a query, a fetchone()
call on the cursor returns an list
representing the next row from the result set:
>>> result = cursor.fetchone()
['Algol']
You can call fetchone()
multiple times to return multiple rows.
If no more rows are available, None
is returned.
Tip
The cursor
object is an iterator, and the fetchone()
method is an alias for next()
.
fetchmany()
¶
After executing a query, a fetch_many()
call with a numeric argument
returns the specified number of result rows:
>>> cursor.execute("SELECT name FROM locations order by name")
>>> result = cursor.fetchmany(2)
>>> pprint(result)
[['Aldebaran'], ['Algol']]
If a number is not given as an argument, fetch_many()
will return a result
list with one result row:
>>> cursor.fetchmany()
[['Allosimanius Syneca']]
fetchall()
¶
After executing a query, a fetchall()
call on the cursor returns all
remaining rows:
>>> cursor.execute("SELECT name FROM locations ORDER BY name")
>>> cursor.fetchall()
[['Aldebaran'],
['Algol'],
['Allosimanius Syneca'],
...
['Old Faithful'],
['Outer Eastern Rim']]
Accessing column names¶
Result rows are lists, not dictionaries. Which means that they do use contain
column names for keys. If you want to access column names, you must use
cursor.description
.
The Python Database API Specification v2.0 defines seven description
attributes per column, but only the first one (column name) is supported by
this library. All other attributes are None
.
Let’s say you have a query like this:
>>> cursor.execute("SELECT * FROM locations ORDER BY name")
>>> cursor.fetchone()
[1373932800000,
None,
'Max Quordlepleen claims that the only thing left ...',
...
None,
1]
The cursor description
might look like this:
>>> cursor.description
(('date', None, None, None, None, None, None),
('datetime_tz', None, None, None, None, None, None),
('datetime_notz', None, None, None, None, None, None),
('description', None, None, None, None, None, None),
...
('nullable_datetime', None, None, None, None, None, None),
('position', None, None, None, None, None, None))
You can turn this into something more manageable with List Comprehensions:
>>> [column[0] for column in cursor.description]
['date', 'datetime_tz', 'datetime_notz', ..., 'nullable_datetime', 'position']
Data type conversion¶
The cursor object can optionally convert database types to native Python data
types. There is a default implementation for the CrateDB data types IP
and
TIMESTAMP
on behalf of the DefaultTypeConverter
.
>>> from crate.client.converter import DefaultTypeConverter
>>> from crate.client.cursor import Cursor
>>> cursor = connection.cursor(converter=DefaultTypeConverter())
>>> cursor.execute("SELECT datetime_tz, datetime_notz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 18, 18, 10, 36, 758000), datetime.datetime(2022, 7, 18, 18, 10, 36, 758000)]
Custom data type conversion¶
By providing a custom converter instance, you can define your own data type
conversions. For investigating the list of available data types, please either
inspect the DataType
enum, or the documentation about the list of available
CrateDB data type identifiers for the HTTP interface.
This example creates and applies a simple custom converter for converging
CrateDB’s BOOLEAN
type to Python’s str
type. It is using a simple
converter function defined as lambda
, which assigns yes
for boolean
True
, and no
otherwise.
>>> from crate.client.converter import Converter, DataType
>>> converter = Converter()
>>> converter.set(DataType.BOOLEAN, lambda value: value is True and "yes" or "no")
>>> cursor = connection.cursor(converter=converter)
>>> cursor.execute("SELECT flag FROM locations ORDER BY name")
>>> cursor.fetchone()
['no']
TIMESTAMP
conversion with time zone¶
Based on the data type converter functionality, the driver offers a convenient
interface to make it return timezone-aware datetime
objects, using the
desired time zone.
For your reference, in the following examples, epoch 1658167836758 is
Mon, 18 Jul 2022 18:10:36 GMT
.
>>> import datetime
>>> tz_mst = datetime.timezone(datetime.timedelta(hours=7), name="MST")
>>> cursor = connection.cursor(time_zone=tz_mst)
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 19, 1, 10, 36, 758000, tzinfo=datetime.timezone(datetime.timedelta(seconds=25200), 'MST'))]
For the time_zone
keyword argument, different data types are supported.
The available options are:
datetime.timezone.utc
datetime.timezone(datetime.timedelta(hours=7), name="MST")
pytz.timezone("Australia/Sydney")
zoneinfo.ZoneInfo("Australia/Sydney")
+0530
(UTC offset in string format)
Let’s exercise all of them.
>>> cursor.time_zone = datetime.timezone.utc
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 18, 18, 10, 36, 758000, tzinfo=datetime.timezone.utc)]
>>> import pytz
>>> cursor.time_zone = pytz.timezone("Australia/Sydney")
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
['foo', datetime.datetime(2022, 7, 19, 4, 10, 36, 758000, tzinfo=<DstTzInfo 'Australia/Sydney' AEST+10:00:00 STD>)]
>>> try:
... import zoneinfo
... except ImportError:
... from backports import zoneinfo
>>> cursor.time_zone = zoneinfo.ZoneInfo("Australia/Sydney")
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 19, 4, 10, 36, 758000, tzinfo=zoneinfo.ZoneInfo(key='Australia/Sydney'))]
>>> cursor.time_zone = "+0530"
>>> cursor.execute("SELECT datetime_tz FROM locations ORDER BY name")
>>> cursor.fetchone()
[datetime.datetime(2022, 7, 18, 23, 40, 36, 758000, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800), '+0530'))]