Features Overview¶
Introduction¶
SQLAlchemy is the most popular Object-Relational Mapping (ORM) library for Python.
The sqlalchemy-cratedb package provides the SQLAlchemy dialect for CrateDB. It is registered at
installation time, and can be used without further configuration.
The CrateDB SQLAlchemy dialect is validated to work with SQLAlchemy versions
1.3, 1.4, and 2.0.
See also
For general help using SQLAlchemy, consult the SQLAlchemy tutorial or the SQLAlchemy library.
Supplementary information about the CrateDB SQLAlchemy dialect can be found in the data types appendix.
Code examples for using the CrateDB SQLAlchemy dialect can be found at Examples.
Connecting¶
Database URLs¶
In an SQLAlchemy context, database addresses are represented by Uniform Resource Locators (URL) called Database URLs.
The simplest database URL for CrateDB looks like this:
crate://<HOST>/[?option=value]
Here, <HOST> is the node host string. After the host, additional query
parameters can be specified to adjust some connection settings.
A host string looks like this:
[<USERNAME>:<PASSWORD>@]<HOST_ADDR>:<PORT>
Here, <HOST_ADDR> is the hostname or IP address of the CrateDB node and
<PORT> is a valid psql.port number.
When authentication is needed, the credentials can be optionally supplied using
<USERNAME>:<PASSWORD>@. For connecting to an SSL-secured HTTP endpoint, you
can add the query parameter ?sslmode=require to the database URI.
Example database URIs:
crate://localhost:4200crate://crate-1.vm.example.com:4200crate://username:password@crate-2.vm.example.com:4200/?sslmode=requirecrate://198.51.100.1:4200
Tip
If <HOST> is blank (i.e. the database URI is just crate://), then
localhost:4200 will be assumed.
Getting a connection¶
Create an engine¶
You can connect to CrateDB using the create_engine method. This method
takes a database URL.
Import the sa module, like so:
>>> import sqlalchemy as sa
To connect to localhost:4200, you can do this:
>>> engine = sa.create_engine('crate://')
To connect to crate-1.vm.example.com:4200, you would do this:
>>> engine = sa.create_engine('crate://crate-1.vm.example.com:4200')
If your CrateDB cluster has multiple nodes, however, we recommend that you
configure all of them. You can do that by specifying the crate:// database
URL and passing in a list of host strings passed using
the connect_args argument, like so:
>>> engine = sa.create_engine('crate://', connect_args={
... 'servers': ['198.51.100.1:4200', '198.51.100.2:4200']
... })
When you do this, the Database API layer will use its round-robin implementation.
The client validates SSL server certificates
by default. For further adjusting this behaviour, SSL verification options can
be passed in by using the connect_args dictionary.
For example, use ca_cert for providing a path to the CA certificate used
for signing the server certificate:
>>> engine = sa.create_engine(
... 'crate://',
... connect_args={
... 'servers': ['198.51.100.1:4200', '198.51.100.2:4200'],
... 'ca_cert': '<PATH_TO_CA_CERT>',
... }
... )
In order to disable SSL verification, use verify_ssl_cert = False, like:
>>> engine = sa.create_engine(
... 'crate://',
... connect_args={
... 'servers': ['198.51.100.1:4200', '198.51.100.2:4200'],
... 'verify_ssl_cert': False,
... }
... )
Get a session¶
Once you have an CrateDB engine set up, you can create and use an SQLAlchemy
Session object to execute queries:
>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
See also
SQLAlchemy has more documentation about this topic on Session Basics.
Connecting to CrateDB Cloud¶
Connecting to CrateDB Cloud works like this. Please note the ?sslmode=require
query parameter at the end of the database URI.
>>> import sqlalchemy as sa
>>> dburi = "crate://admin:<PASSWORD>@example.aks1.westeurope.azure.cratedb.net:4200?sslmode=require"
>>> engine = sa.create_engine(dburi, echo=True)
Tables¶
Table definition¶
Here is an example SQLAlchemy table definition using the declarative system:
>>> from sqlalchemy.ext import declarative
>>> from uuid import uuid4
>>> from sqlalchemy_cratedb import ObjectType, ObjectArray
>>> def gen_key():
... return str(uuid4())
>>> Base = declarative.declarative_base(bind=engine)
>>> class Character(Base):
...
... __tablename__ = 'characters'
... __table_args__ = {
... 'crate_number_of_shards': 3
... }
...
... id = sa.Column(sa.String, primary_key=True, default=gen_key)
... name = sa.Column(sa.String, crate_index=False)
... name_normalized = sa.Column(sa.String, sa.Computed("lower(name)"))
... quote = sa.Column(sa.String, nullable=False)
... details = sa.Column(ObjectType)
... more_details = sa.Column(ObjectArray)
... name_ft = sa.Column(sa.String)
... quote_ft = sa.Column(sa.String)
... even_more_details = sa.Column(sa.String, crate_columnstore=False)
... created_at = sa.Column(sa.DateTime, server_default=sa.func.now())
...
... __mapper_args__ = {
... 'exclude_properties': ['name_ft', 'quote_ft']
... }
In this example, we:
Define a
gen_keyfunction that producesUUIDsSet up a
Baseclass for the tableCreate the
Charactersclass for thecharacterstableUse the
gen_keyfunction to provide a default value for theidcolumn (which is also the primary key)Use standard SQLAlchemy types for the
id,name, andquotecolumnsUse
nullable=Falseto define aNOT NULLconstraintDisable indexing of the
namecolumn usingcrate_index=FalseDefine a computed column
name_normalized(based onname) that translates into a generated columnUse the ObjectType extension type for the
detailscolumnUse the ObjectArray extension type for the
more_detailscolumnSet up the
name_ftandquote_ftfulltext indexes, but exclude them from the mapping (so SQLAlchemy doesn’t try to update them as if they were columns)Disable the columnstore of the
even_more_detailscolumn usingcrate_columnstore=FalseAdd a
created_atcolumn whose default value is set by CrateDB’snow()function.
Tip
This example table is used throughout the rest of this document.
See also
The SQLAlchemy documentation has more information about Describing Databases with MetaData.
Additional __table_args__¶
The example also shows the optional usage of __table_args__ to configure
table-wide attributes. The following attributes can optionally be configured:
crate_number_of_shards: The number of primary shards the table will be split intocrate_clustered_by: The routing column to use for shardingcrate_number_of_replicas: The number of replicas to allocate for each primary shardcrate_partitioned_by: One or more columns to use as a partition key
See also
The CREATE TABLE documentation contains more information on each of the attributes.
_id as primary key¶
As with version 4.2 CrateDB supports the RETURNING clause, which makes it
possible to use the _id column as fetched value for the PRIMARY KEY
constraint, since the SQLAlchemy ORM always requires a primary key.
A table schema like this
CREATE TABLE "doc"."logs" (
"ts" TIMESTAMP WITH TIME ZONE NOT NULL,
"level" TEXT,
"message" TEXT
)
would translate into the following declarative model:
>>> from sqlalchemy.schema import FetchedValue
>>> class Log(Base):
...
... __tablename__ = 'logs'
... __mapper_args__ = {
... 'exclude_properties': ['id']
... }
...
... id = sa.Column("_id", sa.String, server_default=FetchedValue(), primary_key=True)
... ts = sa.Column(sa.DateTime, server_default=sa.func.current_timestamp())
... level = sa.Column(sa.String)
... message = sa.Column(sa.String)
>>> log = Log(level="info", message="Hello World")
>>> session.add(log)
>>> session.commit()
>>> log.id
...
Auto-generated identifiers¶
CrateDB does not provide traditional sequences or SERIAL data type support,
which enable automatically assigning incremental values when inserting records.
However, it offers server-side support by providing an SQL function to generate
random identifiers of STRING type, and client-side support for generating
INTEGER-based identifiers, when using the SQLAlchemy dialect.
gen_random_text_uuid¶
CrateDB 4.5.0 added the gen_random_text_uuid() scalar function, which can also be used within an SQL DDL statement, in order to automatically assign random identifiers to newly inserted records on the server side.
In this spirit, it is suitable to be used as a PRIMARY KEY constraint for SQLAlchemy.
It works on SQLAlchemy-defined columns of type sa.String.
A table schema like this
CREATE TABLE "doc"."items" (
"id" STRING DEFAULT gen_random_text_uuid() NOT NULL PRIMARY KEY,
"name" STRING
)
would translate into the following declarative model:
>>> class Item(Base):
...
... __tablename__ = 'items'
...
... id = sa.Column("id", sa.String, server_default=func.gen_random_text_uuid(), primary_key=True)
... name = sa.Column("name", sa.String)
>>> item = Item(name="Foobar")
>>> session.add(item)
>>> session.commit()
>>> item.id
...
Timestamp-based Autoincrement¶
By using SQLAlchemy’s sa.func.now(), you can assign automatically generated
identifiers to SQLAlchemy columns of types sa.BigInteger, sa.DateTime,
and sa.String.
This emulates autoincrement / sequential ID behavior for designated columns, based on assigning timestamps on record insertion.
>>> class Item(Base):
... id = sa.Column("id", sa.BigInteger, default=func.now(), primary_key=True)
... name = sa.Column("name", sa.String)
>>> item = Item(name="Foobar")
>>> session.add(item)
>>> session.commit()
>>> item.id
...
There is a support utility which emulates autoincrement / sequential ID behavior for designated columns, based on assigning timestamps on record insertion. See Synthetic Autoincrement using Timestamps.
Extension types¶
In the example SQLAlchemy table definition above, we are making use of the two extension data types that the CrateDB SQLAlchemy dialect provides.
See also
The appendix has a full data types reference.
ObjectType¶
Objects are a common, and useful, data type when using CrateDB, so the CrateDB
SQLAlchemy dialect provides a custom Object type extension for working with
these values.
Here’s how you use the SQLAlchemy Session to insert two records:
>>> # use the crate engine from earlier examples
>>> Session = sessionmaker(bind=crate)
>>> session = Session()
>>> arthur = Character(name='Arthur Dent')
>>> arthur.details = {}
>>> arthur.details['gender'] = 'male'
>>> arthur.details['species'] = 'human'
>>> session.add(arthur)
>>> trillian = Character(name='Tricia McMillan')
>>> trillian.details = {}
>>> trillian.quote = "We're on a space ship Arthur. In space."
>>> trillian.details['gender'] = 'female'
>>> trillian.details['species'] = 'human'
>>> trillian.details['female_only_attribute'] = 1
>>> session.add(trillian)
>>> session.commit()
Note
The information we supply via the details column isn’t defined in the
original SQLAlchemy table definition schema.
These details can be specified as object column policy when you create
the column in CrateDB, you can either use the STRICT column policy, or the DYNAMIC column
policy.
Note
Behind the scenes, if you update an ObjectType property, and commit that
change, the UPDATE statement sent
to CrateDB will only include the data necessary to update the changed
sub-columns.
ObjectArray¶
In addition to the ObjectType type, the CrateDB SQLAlchemy dialect also provides
an ObjectArray type, which is structured as a list of
dictionaries.
Here’s how you might set the value of an ObjectArray column:
>>> arthur.more_details = [{'foo': 1, 'bar': 10}, {'foo': 2}]
>>> session.commit()
If you append an object, like this:
>>> arthur.more_details.append({'foo': 3})
>>> session.commit()
The resulting object will look like this:
>>> arthur.more_details
[{'foo': 1, 'bar': 10}, {'foo': 2}, {'foo': 3}]
Caution
Behind the scenes, if you update an ObjectArray, and commit that
change, the UPDATE statement
sent to CrateDB will include all of the ObjectArray data.
Geopoint and Geoshape¶
The CrateDB SQLAlchemy dialect provides two geospatial types:
Geopoint, which represents a longitude and latitude coordinateGeoshape, which is used to store geometric GeoJSON geometry objects
To use these types, you can create columns, like so:
>>> from sqlalchemy_cratedb import Geopoint, Geoshape
>>> class City(Base):
...
... __tablename__ = 'cities'
... name = sa.Column(sa.String, primary_key=True)
... coordinate = sa.Column(Geopoint)
... area = sa.Column(Geoshape)
A geopoint can be created in multiple ways. Firstly, you can define it as a
tuple of (longitude, latitude):
>>> point = (139.76, 35.68)
Secondly, you can define it as a geojson Point object:
>>> from geojson import Point
>>> point = Point(coordinates=(139.76, 35.68))
To create a geoshape, you can use a geojson shape object, such as a Polygon:
>>> from geojson import Point, Polygon
>>> area = Polygon(
... [
... [
... (139.806, 35.515),
... (139.919, 35.703),
... (139.768, 35.817),
... (139.575, 35.760),
... (139.584, 35.619),
... (139.806, 35.515),
... ]
... ]
... )
You can then set the values of the Geopoint and Geoshape columns:
>>> tokyo = City(name="Tokyo", coordinate=point, area=area)
>>> session.add(tokyo)
>>> session.commit()
Querying¶
When the commit method is called, two INSERT statements are sent to
CrateDB. However, the newly inserted rows aren’t immediately available for
querying because the table index is only updated periodically (one second, by
default, which is a short time for me and you, but a long time for your code).
You can request a table refresh to update the index manually:
>>> connection = engine.connect()
>>> _ = connection.execute(text("REFRESH TABLE characters"))
Note
Newly inserted rows can still be queried immediately if a lookup by primary key is done.
Here’s what a regular select might look like:
>>> query = session.query(Character).order_by(Character.name)
>>> [(c.name, c.details['gender']) for c in query]
[('Arthur Dent', 'male'), ('Tricia McMillan', 'female')]
You can also select a portion of each record, and this even works inside ObjectType columns:
>>> sorted(session.query(Character.details['gender']).all())
[('female',), ('male',)]
You can also filter on attributes inside the ObjectType column:
>>> query = session.query(Character.name)
>>> query.filter(Character.details['gender'] == 'male').all()
[('Arthur Dent',)]
To filter on an ObjectArray, you have to do something like this:
>>> from sqlalchemy.sql import operators
>>> query = session.query(Character.name)
>>> query.filter(Character.more_details['foo'].any(1, operator=operators.eq)).all()
[(u'Arthur Dent',)]
Here, we’re using SQLAlchemy’s any
method along with Python’s operator.eq() function, in order to
match the value 1 against the key foo of any dictionary in the
more_details list.
Only one of the keys has to match for the row to be returned.
This works, because ObjectArray keys return a list of all values for that
key, like so:
>>> arthur.more_details['foo']
[1, 2, 3]
Querying a key of an ObjectArray column will return all values for that key
for all matching rows:
>>> query = session.query(Character.more_details['foo']).order_by(Character.name)
>>> query.all()
[([1, 2, 3],), (None,)]
Aggregate functions¶
SQLAlchemy supports different ways to count result rows. However, because CrateDB doesn’t support subqueries, counts must be written in one of the following two ways.
This counts the number of character records by counting the number of id
values in the table:
>>> session.query(sa.func.count(Character.id)).scalar()
2
Note
If you’re doing it like this, the column you select must be the primary key.
And this counts the number of character records by selecting all columns, and then counting the number of rows:
>>> session.query(sa.func.count('*')).select_from(Character).scalar()
2
You can layer in calls to group_by and order_by when you use one of
these methods, like so:
>>> session.query(sa.func.count(Character.id), Character.name) \
... .group_by(Character.name) \
... .order_by(sa.desc(sa.func.count(Character.id))) \
... .order_by(Character.name).all()
[(1, u'Arthur Dent'), (1, u'Tricia McMillan')]
Fulltext search¶
Matching¶
Fulltext Search in CrateDB is done with the MATCH Predicate.
The CrateDB SQLAlchemy dialect provides a match function, which can be used
to search one or multiple fields.
Here’s an example use of the match function:
>>> from sqlalchemy_cratedb import match
>>> session.query(Character.name) \
... .filter(match(Character.name_ft, 'Arthur')) \
... .all()
[('Arthur Dent',)]
In this example, we’re selecting character name values, and returning all
rows where the name_ft index matches the string Arthur.
Note
To use fulltext searches on a column, an explicit fulltext index with an analyzer must be created on the column. Consult the documentation about Fulltext indices for more information.
The match function takes the following options:
match(column, term, match_type=None, options=None)
column:A reference to a column or an index:
match(Character.name_ft, 'Trillian')
Or a subcolumn:
match(Character.details['name']['first'], 'Trillian')
Or a dictionary of the same, with boost values:
match({Character.name_ft: 0.5, Character.details['name']['first']: 0.8, Character.details['name']['last']: 0.2}, 'Trillian')
See also
The arguments reference of the MATCH Predicate has more in-depth information.
term:The term to match against.
This string is analyzed and the resulting tokens are compared to the index.
match_type:(optional)
The Match Types.
Determine how the
termis applied and the _score gets calculated. See also score usage.Here’s an example:
match({Character.name_ft: 0.5, Character.details['name']['first']: 0.8, Character.details['name']['last']: 0.2}, 'Trillian', match_type='phrase')
options:(optional)
The match options.
Specify match type behaviour. (Not possible without a specified match type.)
Match options must be supplied as a dictionary:
match({Character.name_ft: 0.5, Character.details['name']['first']: 0.8, Character.details['name']['last']: 0.2}, 'Trillian', match_type='phrase' options={ 'fuzziness': 3, 'analyzer': 'english'})
Relevance¶
To get the relevance of a matching row, the row _score can be used. See also score usage.
The score is relative to other result rows produced by your query. The higher the score, the more relevant the result row.
The score is made available via the _score column, which is a virtual
column, meaning that it doesn’t exist on the source table, and in most cases,
should not be included in your table definition.
You can select _score as part of a query, like this:
>>> session.query(Character.name, '_score') \
... .filter(match(Character.quote_ft, 'space')) \
... .all()
[('Tricia McMillan', ...)]
Here, we’re matching the term space against the quote_ft fulltext
index. And we’re selecting the name column of the character by using the
table definition But notice that we select the associated score by passing in
the virtual column name as a string (_score) instead of using a defined
column on the Character class.