Features Overview¶

Introduction¶

SQLAlchemy is the most popular Object-Relational Mapping (ORM) library for Python.

The sqlalchemy-cratedb package provides the SQLAlchemy dialect for CrateDB. It is registered at installation time, and can be used without further configuration.

The CrateDB SQLAlchemy dialect is validated to work with SQLAlchemy versions 1.3, 1.4, and 2.0.

See also

For general help using SQLAlchemy, consult the SQLAlchemy tutorial or the SQLAlchemy library.

Supplementary information about the CrateDB SQLAlchemy dialect can be found in the data types appendix.

Code examples for using the CrateDB SQLAlchemy dialect can be found at Examples.

Connecting¶

Database URLs¶

In an SQLAlchemy context, database addresses are represented by Uniform Resource Locators (URL) called Database URLs.

The simplest database URL for CrateDB looks like this:

crate://<HOST>/[?option=value]

Here, <HOST> is the node host string. After the host, additional query parameters can be specified to adjust some connection settings.

A host string looks like this:

[<USERNAME>:<PASSWORD>@]<HOST_ADDR>:<PORT>

Here, <HOST_ADDR> is the hostname or IP address of the CrateDB node and <PORT> is a valid psql.port number.

When authentication is needed, the credentials can be optionally supplied using <USERNAME>:<PASSWORD>@. For connecting to an SSL-secured HTTP endpoint, you can add the query parameter ?sslmode=require to the database URI.

Example database URIs:

crate://localhost:4200
crate://crate-1.vm.example.com:4200
crate://username:password@crate-2.vm.example.com:4200/?sslmode=require
crate://198.51.100.1:4200

Tip

If <HOST> is blank (i.e. the database URI is just crate://), then localhost:4200 will be assumed.

Getting a connection¶

Create an engine¶

You can connect to CrateDB using the create_engine method. This method takes a database URL.

Import the sa module, like so:

>>> import sqlalchemy as sa

To connect to localhost:4200, you can do this:

>>> engine = sa.create_engine('crate://')

To connect to crate-1.vm.example.com:4200, you would do this:

>>> engine = sa.create_engine('crate://crate-1.vm.example.com:4200')

If your CrateDB cluster has multiple nodes, however, we recommend that you configure all of them. You can do that by specifying the crate:// database URL and passing in a list of host strings passed using the connect_args argument, like so:

>>> engine = sa.create_engine('crate://', connect_args={
...     'servers': ['198.51.100.1:4200', '198.51.100.2:4200']
... })

When you do this, the Database API layer will use its round-robin implementation.

The client validates SSL server certificates by default. For further adjusting this behaviour, SSL verification options can be passed in by using the connect_args dictionary.

For example, use ca_cert for providing a path to the CA certificate used for signing the server certificate:

>>> engine = sa.create_engine(
...     'crate://',
...     connect_args={
...         'servers': ['198.51.100.1:4200', '198.51.100.2:4200'],
...         'ca_cert': '<PATH_TO_CA_CERT>',
...     }
... )

In order to disable SSL verification, use verify_ssl_cert = False, like:

>>> engine = sa.create_engine(
...     'crate://',
...     connect_args={
...         'servers': ['198.51.100.1:4200', '198.51.100.2:4200'],
...         'verify_ssl_cert': False,
...     }
... )

Get a session¶

Once you have an CrateDB engine set up, you can create and use an SQLAlchemy Session object to execute queries:

>>> from sqlalchemy.orm import sessionmaker

>>> Session = sessionmaker(bind=engine)
>>> session = Session()

See also

SQLAlchemy has more documentation about this topic on Session Basics.

Connecting to CrateDB Cloud¶

Connecting to CrateDB Cloud works like this. Please note the ?sslmode=require query parameter at the end of the database URI.

>>> import sqlalchemy as sa
>>> dburi = "crate://admin:<PASSWORD>@example.aks1.westeurope.azure.cratedb.net:4200?sslmode=require"
>>> engine = sa.create_engine(dburi, echo=True)

Tables¶

Table definition¶

Here is an example SQLAlchemy table definition using the declarative system:

>>> from sqlalchemy.ext import declarative
>>> from uuid import uuid4
>>> from sqlalchemy_cratedb import ObjectType, ObjectArray

>>> def gen_key():
...     return str(uuid4())

>>> Base = declarative.declarative_base(bind=engine)

>>> class Character(Base):
...
...     __tablename__ = 'characters'
...     __table_args__ = {
...         'crate_number_of_shards': 3
...     }
...
...     id = sa.Column(sa.String, primary_key=True, default=gen_key)
...     name = sa.Column(sa.String, crate_index=False)
...     name_normalized = sa.Column(sa.String, sa.Computed("lower(name)"))
...     quote = sa.Column(sa.String, nullable=False)
...     details = sa.Column(ObjectType)
...     more_details = sa.Column(ObjectArray)
...     name_ft = sa.Column(sa.String)
...     quote_ft = sa.Column(sa.String)
...     even_more_details = sa.Column(sa.String, crate_columnstore=False)
...     created_at = sa.Column(sa.DateTime, server_default=sa.func.now())
...
...     __mapper_args__ = {
...         'exclude_properties': ['name_ft', 'quote_ft']
...     }

In this example, we:

Define a gen_key function that produces UUIDs
Set up a Base class for the table
Create the Characters class for the characters table
Use the gen_key function to provide a default value for the id column (which is also the primary key)
Use standard SQLAlchemy types for the id, name, and quote columns
Use nullable=False to define a NOT NULL constraint
Disable indexing of the name column using crate_index=False
Define a computed column name_normalized (based on name) that translates into a generated column
Use the ObjectType extension type for the details column
Use the ObjectArray extension type for the more_details column
Set up the name_ft and quote_ft fulltext indexes, but exclude them from the mapping (so SQLAlchemy doesn’t try to update them as if they were columns)
Disable the columnstore of the even_more_details column using crate_columnstore=False
Add a created_at column whose default value is set by CrateDB’s now() function.

Tip

This example table is used throughout the rest of this document.

See also

The SQLAlchemy documentation has more information about Describing Databases with MetaData.

Additional `__table_args__`¶

The example also shows the optional usage of __table_args__ to configure table-wide attributes. The following attributes can optionally be configured:

crate_number_of_shards: The number of primary shards the table will be split into
crate_clustered_by: The routing column to use for sharding
crate_number_of_replicas: The number of replicas to allocate for each primary shard
crate_partitioned_by: One or more columns to use as a partition key

See also

The CREATE TABLE documentation contains more information on each of the attributes.

`_id` as primary key¶

As with version 4.2 CrateDB supports the RETURNING clause, which makes it possible to use the _id column as fetched value for the PRIMARY KEY constraint, since the SQLAlchemy ORM always requires a primary key.

A table schema like this

CREATE TABLE "doc"."logs" (
  "ts" TIMESTAMP WITH TIME ZONE NOT NULL,
  "level" TEXT,
  "message" TEXT
)

would translate into the following declarative model:

>>> from sqlalchemy.schema import FetchedValue

>>> class Log(Base):
...
...     __tablename__ = 'logs'
...     __mapper_args__ = {
...         'exclude_properties': ['id']
...     }
...
...     id = sa.Column("_id", sa.String, server_default=FetchedValue(), primary_key=True)
...     ts = sa.Column(sa.DateTime, server_default=sa.func.current_timestamp())
...     level = sa.Column(sa.String)
...     message = sa.Column(sa.String)

>>> log = Log(level="info", message="Hello World")
>>> session.add(log)
>>> session.commit()
>>> log.id
...

Auto-generated identifiers¶

CrateDB does not provide traditional sequences or SERIAL data type support, which enable automatically assigning incremental values when inserting records. However, it offers server-side support by providing an SQL function to generate random identifiers of STRING type, and client-side support for generating INTEGER-based identifiers, when using the SQLAlchemy dialect.

`gen_random_text_uuid`¶

CrateDB 4.5.0 added the gen_random_text_uuid() scalar function, which can also be used within an SQL DDL statement, in order to automatically assign random identifiers to newly inserted records on the server side.

In this spirit, it is suitable to be used as a PRIMARY KEY constraint for SQLAlchemy. It works on SQLAlchemy-defined columns of type sa.String.

A table schema like this

CREATE TABLE "doc"."items" (
  "id" STRING DEFAULT gen_random_text_uuid() NOT NULL PRIMARY KEY,
  "name" STRING
)

would translate into the following declarative model:

>>> class Item(Base):
...
...     __tablename__ = 'items'
...
...     id = sa.Column("id", sa.String, server_default=func.gen_random_text_uuid(), primary_key=True)
...     name = sa.Column("name", sa.String)

>>> item = Item(name="Foobar")
>>> session.add(item)
>>> session.commit()
>>> item.id
...

Timestamp-based Autoincrement¶

By using SQLAlchemy’s sa.func.now(), you can assign automatically generated identifiers to SQLAlchemy columns of types sa.BigInteger, sa.DateTime, and sa.String.

This emulates autoincrement / sequential ID behavior for designated columns, based on assigning timestamps on record insertion.

>>> class Item(Base):
...     id = sa.Column("id", sa.BigInteger, default=func.now(), primary_key=True)
...     name = sa.Column("name", sa.String)

>>> item = Item(name="Foobar")
>>> session.add(item)
>>> session.commit()
>>> item.id
...

There is a support utility which emulates autoincrement / sequential ID behavior for designated columns, based on assigning timestamps on record insertion. See Synthetic Autoincrement using Timestamps.

Extension types¶

In the example SQLAlchemy table definition above, we are making use of the two extension data types that the CrateDB SQLAlchemy dialect provides.

See also

The appendix has a full data types reference.

`ObjectType`¶

Objects are a common, and useful, data type when using CrateDB, so the CrateDB SQLAlchemy dialect provides a custom Object type extension for working with these values.

Here’s how you use the SQLAlchemy Session to insert two records:

>>> # use the crate engine from earlier examples
>>> Session = sessionmaker(bind=crate)
>>> session = Session()

>>> arthur = Character(name='Arthur Dent')
>>> arthur.details = {}
>>> arthur.details['gender'] = 'male'
>>> arthur.details['species'] = 'human'
>>> session.add(arthur)

>>> trillian = Character(name='Tricia McMillan')
>>> trillian.details = {}
>>> trillian.quote = "We're on a space ship Arthur. In space."
>>> trillian.details['gender'] = 'female'
>>> trillian.details['species'] = 'human'
>>> trillian.details['female_only_attribute'] = 1
>>> session.add(trillian)
>>> session.commit()

Note

The information we supply via the details column isn’t defined in the original SQLAlchemy table definition schema. These details can be specified as object column policy when you create the column in CrateDB, you can either use the STRICT column policy, or the DYNAMIC column policy.

Note

Behind the scenes, if you update an ObjectType property, and commit that change, the UPDATE statement sent to CrateDB will only include the data necessary to update the changed sub-columns.

`ObjectArray`¶

In addition to the ObjectType type, the CrateDB SQLAlchemy dialect also provides an ObjectArray type, which is structured as a list of dictionaries.

Here’s how you might set the value of an ObjectArray column:

>>> arthur.more_details = [{'foo': 1, 'bar': 10}, {'foo': 2}]
>>> session.commit()

If you append an object, like this:

>>> arthur.more_details.append({'foo': 3})
>>> session.commit()

The resulting object will look like this:

>>> arthur.more_details
[{'foo': 1, 'bar': 10}, {'foo': 2}, {'foo': 3}]

Caution

Behind the scenes, if you update an ObjectArray, and commit that change, the UPDATE statement sent to CrateDB will include all of the ObjectArray data.

`Geopoint` and `Geoshape`¶

The CrateDB SQLAlchemy dialect provides two geospatial types:

Geopoint, which represents a longitude and latitude coordinate
Geoshape, which is used to store geometric GeoJSON geometry objects

To use these types, you can create columns, like so:

>>> from sqlalchemy_cratedb import Geopoint, Geoshape

>>> class City(Base):
...
...    __tablename__ = 'cities'
...    name = sa.Column(sa.String, primary_key=True)
...    coordinate = sa.Column(Geopoint)
...    area = sa.Column(Geoshape)

A geopoint can be created in multiple ways. Firstly, you can define it as a tuple of (longitude, latitude):

>>> point = (139.76, 35.68)

Secondly, you can define it as a geojson Point object:

>>> from geojson import Point
>>> point = Point(coordinates=(139.76, 35.68))

To create a geoshape, you can use a geojson shape object, such as a Polygon:

>>> from geojson import Point, Polygon
>>> area = Polygon(
...     [
...         [
...             (139.806, 35.515),
...             (139.919, 35.703),
...             (139.768, 35.817),
...             (139.575, 35.760),
...             (139.584, 35.619),
...             (139.806, 35.515),
...         ]
...     ]
... )

You can then set the values of the Geopoint and Geoshape columns:

>>> tokyo = City(name="Tokyo", coordinate=point, area=area)
>>> session.add(tokyo)
>>> session.commit()

Querying¶

When the commit method is called, two INSERT statements are sent to CrateDB. However, the newly inserted rows aren’t immediately available for querying because the table index is only updated periodically (one second, by default, which is a short time for me and you, but a long time for your code).

You can request a table refresh to update the index manually:

>>> connection = engine.connect()
>>> _ = connection.execute(text("REFRESH TABLE characters"))

Note

Newly inserted rows can still be queried immediately if a lookup by primary key is done.

Here’s what a regular select might look like:

>>> query = session.query(Character).order_by(Character.name)
>>> [(c.name, c.details['gender']) for c in query]
[('Arthur Dent', 'male'), ('Tricia McMillan', 'female')]

You can also select a portion of each record, and this even works inside ObjectType columns:

>>> sorted(session.query(Character.details['gender']).all())
[('female',), ('male',)]

You can also filter on attributes inside the ObjectType column:

>>> query = session.query(Character.name)
>>> query.filter(Character.details['gender'] == 'male').all()
[('Arthur Dent',)]

To filter on an ObjectArray, you have to do something like this:

>>> from sqlalchemy.sql import operators

>>> query = session.query(Character.name)
>>> query.filter(Character.more_details['foo'].any(1, operator=operators.eq)).all()
[(u'Arthur Dent',)]

Here, we’re using SQLAlchemy’s any method along with Python’s operator.eq() function, in order to match the value 1 against the key foo of any dictionary in the more_details list.

Only one of the keys has to match for the row to be returned.

This works, because ObjectArray keys return a list of all values for that key, like so:

>>> arthur.more_details['foo']
[1, 2, 3]

Querying a key of an ObjectArray column will return all values for that key for all matching rows:

>>> query = session.query(Character.more_details['foo']).order_by(Character.name)
>>> query.all()
[([1, 2, 3],), (None,)]

Aggregate functions¶

SQLAlchemy supports different ways to count result rows. However, because CrateDB doesn’t support subqueries, counts must be written in one of the following two ways.

This counts the number of character records by counting the number of id values in the table:

>>> session.query(sa.func.count(Character.id)).scalar()
2

Note

If you’re doing it like this, the column you select must be the primary key.

And this counts the number of character records by selecting all columns, and then counting the number of rows:

>>> session.query(sa.func.count('*')).select_from(Character).scalar()
2

You can layer in calls to group_by and order_by when you use one of these methods, like so:

>>> session.query(sa.func.count(Character.id), Character.name) \
...     .group_by(Character.name) \
...     .order_by(sa.desc(sa.func.count(Character.id))) \
...     .order_by(Character.name).all()
[(1, u'Arthur Dent'), (1, u'Tricia McMillan')]

Fulltext search¶

Matching¶

Fulltext Search in CrateDB is done with the MATCH Predicate.

The CrateDB SQLAlchemy dialect provides a match function, which can be used to search one or multiple fields.

Here’s an example use of the match function:

>>> from sqlalchemy_cratedb import match

>>> session.query(Character.name) \
...     .filter(match(Character.name_ft, 'Arthur')) \
...     .all()
[('Arthur Dent',)]

In this example, we’re selecting character name values, and returning all rows where the name_ft index matches the string Arthur.

Note

To use fulltext searches on a column, an explicit fulltext index with an analyzer must be created on the column. Consult the documentation about Fulltext indices for more information.

The match function takes the following options:

match(column, term, match_type=None, options=None)

column:

A reference to a column or an index:

match(Character.name_ft, 'Trillian')

Or a subcolumn:

match(Character.details['name']['first'], 'Trillian')

Or a dictionary of the same, with boost values:

match({Character.name_ft: 0.5,
       Character.details['name']['first']: 0.8,
       Character.details['name']['last']: 0.2},
      'Trillian')

See also

The arguments reference of the MATCH Predicate has more in-depth information.

term:

The term to match against.

This string is analyzed and the resulting tokens are compared to the index.

match_type:

(optional)

The Match Types.

Determine how the term is applied and the _score gets calculated. See also score usage.

Here’s an example:

match({Character.name_ft: 0.5,
       Character.details['name']['first']: 0.8,
       Character.details['name']['last']: 0.2},
      'Trillian',
      match_type='phrase')

options:

(optional)

The match options.

Specify match type behaviour. (Not possible without a specified match type.)

Match options must be supplied as a dictionary:

match({Character.name_ft: 0.5,
       Character.details['name']['first']: 0.8,
       Character.details['name']['last']: 0.2},
      'Trillian',
      match_type='phrase'
      options={
          'fuzziness': 3,
          'analyzer': 'english'})

Relevance¶

To get the relevance of a matching row, the row _score can be used. See also score usage.

The score is relative to other result rows produced by your query. The higher the score, the more relevant the result row.

The score is made available via the _score column, which is a virtual column, meaning that it doesn’t exist on the source table, and in most cases, should not be included in your table definition.

You can select _score as part of a query, like this:

>>> session.query(Character.name, '_score') \
...     .filter(match(Character.quote_ft, 'space')) \
...     .all()
[('Tricia McMillan', ...)]

Here, we’re matching the term space against the quote_ft fulltext index. And we’re selecting the name column of the character by using the table definition But notice that we select the associated score by passing in the virtual column name as a string (_score) instead of using a defined column on the Character class.

Features Overview¶

Introduction¶

Connecting¶

Database URLs¶

Getting a connection¶

Create an engine¶

Get a session¶

Connecting to CrateDB Cloud¶

Tables¶

Table definition¶

Additional __table_args__¶

_id as primary key¶

Auto-generated identifiers¶

gen_random_text_uuid¶

Timestamp-based Autoincrement¶

Extension types¶

ObjectType¶

ObjectArray¶

Geopoint and Geoshape¶

Querying¶

Aggregate functions¶

Fulltext search¶

Matching¶

Relevance¶

Additional `__table_args__`¶

`_id` as primary key¶

`gen_random_text_uuid`¶

`ObjectType`¶

`ObjectArray`¶

`Geopoint` and `Geoshape`¶