Advanced querying¶
This section of the documentation demonstrates running queries using a fulltext
index with an analyzer, queries using counting and aggregations, and support for
the INSERT...FROM SELECT
and INSERT...RETURNING
constructs, all using the
CrateDB SQLAlchemy dialect.
Table of Contents
Introduction¶
Import the relevant symbols:
>>> import sqlalchemy as sa
>>> from sqlalchemy.orm import sessionmaker
>>> try:
... from sqlalchemy.orm import declarative_base
... except ImportError:
... from sqlalchemy.ext.declarative import declarative_base
>>> from uuid import uuid4
Establish a connection to the database, see also Engine Configuration and Connectivity:
>>> engine = sa.create_engine(f"crate://{crate_host}")
>>> connection = engine.connect()
Create an SQLAlchemy Session:
>>> session = sessionmaker(bind=engine)()
>>> Base = declarative_base()
Introduction to fulltext indexes¶
Fulltext indices take the contents of one or more fields and split it up into tokens that are used for fulltext-search. The transformation from a text to separate tokens is done by an analyzer. In order to conduct fulltext search queries, we need to create a table with a fulltext index with an analyzer.
CREATE TABLE characters (
id STRING PRIMARY KEY,
name STRING,
quote STRING,
INDEX name_ft USING fulltext(name) WITH (analyzer = 'english'),
INDEX quote_ft USING fulltext(quote) WITH (analyzer = 'english')
)
We have to create this table using SQL because it is currently not possible to
create INDEX
fields using SQLAlchemy’s Declarative Mapping.
However, we can define the table to use all other operations:
>>> def gen_key():
... return str(uuid4())
>>> class Character(Base):
... __tablename__ = 'characters'
... id = sa.Column(sa.String, primary_key=True, default=gen_key)
... name = sa.Column(sa.String)
... quote = sa.Column(sa.String)
... name_ft = sa.Column(sa.String)
... quote_ft = sa.Column(sa.String)
... __mapper_args__ = {
... 'exclude_properties': ['name_ft', 'quote_ft']
... }
We define name_ft
and quote_ft
as regular columns, but add them under
__mapper_args__.exclude_properties
to ensure they’re excluded from insert
or update operations.
In order to support fulltext query operations, the CrateDB SQLAlchemy dialect
provides the MATCH Predicate through its match
function.
Let’s add two records we use for testing.
>>> arthur = Character(name='Arthur Dent')
>>> arthur.quote = "Let's go somewhere."
>>> session.add(arthur)
>>> trillian = Character(name='Tricia McMillan')
>>> trillian.quote = "We're on a space ship Arthur. In space."
>>> session.add(trillian)
>>> session.commit()
After INSERT
statements are submitted to the database, the newly inserted
records aren’t immediately available for retrieval, because the index is only
updated periodically (default: each second). In order to synchronize that,
explicitly refresh the table:
>>> _ = connection.execute(sa.text("REFRESH TABLE characters"))
Fulltext search with MATCH predicate¶
Fulltext search in CrateDB is performed using MATCH Predicate.
The CrateDB SQLAlchemy dialect comes with a match
function, which can be used to
search on one or multiple fields.
>>> from sqlalchemy_cratedb import match
>>> session.query(Character.name) \
... .filter(match(Character.name_ft, 'Arthur')) \
... .all()
[('Arthur Dent',)]
To get the relevance of a matching row, you can select the _score
system
column. It is a numeric value which is relative to the other rows.
The higher the score value, the more relevant the row.
In most cases, _score
is not part of the SQLAlchemy table definition,
so it must be passed as a verbatim string, using literal_column
:
>>> session.query(Character.name, sa.literal_column('_score')) \
... .filter(match(Character.quote_ft, 'space')) \
... .all()
[('Tricia McMillan', ...)]
To search multiple columns, use a dictionary where the keys are the columns and
the values are a boost
. A boost
is a factor that increases the relevance
of a column in respect to the other columns:
>>> session.query(Character.name) \
... .filter(match({Character.name_ft: 1.5, Character.quote_ft: 0.1},
... 'Arthur')) \
... .order_by(sa.desc(sa.literal_column('_score'))) \
... .all()
[('Arthur Dent',), ('Tricia McMillan',)]
The match_type
argument determines how a single query_term
is applied,
and how the resulting _score
is computed. Thus, it influences which
documents are considered more relevant. The default selection is best_fields
.
For more information, see Match Types.
If you want to sort the results by _score
, you can use the order_by()
function.
>>> session.query(Character.name) \
... .filter(
... match(Character.name_ft, 'Arth',
... match_type='phrase',
... options={'fuzziness': 3})
... ) \
... .all()
[('Arthur Dent',)]
It is not possible to specify options without the match_type
argument:
>>> session.query(Character.name) \
... .filter(
... match(Character.name_ft, 'Arth',
... options={'fuzziness': 3})
... ) \
... .all()
Traceback (most recent call last):
ValueError: missing match_type. It's not allowed to specify options without match_type
Aggregates: Counting and grouping¶
SQLAlchemy supports different approaches to issue a query with a count aggregate function. Take a look at the count result rows documentation for a full overview.
CrateDB currently does not support all variants as it can not handle the sub-queries yet.
This means that queries using count()
have to be written in one of the
following ways:
>>> session.query(sa.func.count(Character.id)).scalar()
2
>>> session.query(sa.func.count('*')).select_from(Character).scalar()
2
Using the group_by
clause is similar:
>>> session.query(sa.func.count(Character.id), Character.name) \
... .group_by(Character.name) \
... .order_by(sa.desc(sa.func.count(Character.id))) \
... .order_by(Character.name).all()
[(1, 'Arthur Dent'), (1, 'Tricia McMillan')]
INSERT...FROM SELECT
¶
In SQLAlchemy, the insert().from_select()
function returns a new Insert
construct, which represents an INSERT...FROM SELECT
statement. This
functionality is supported by the SQLAlchemy CrateDB dialect. Here is an example
that uses insert().from_select()
.
First, let’s define and create the tables:
>>> from sqlalchemy import select, insert
>>> class Todos(Base):
... __tablename__ = 'todos'
... __table_args__ = {
... 'crate_number_of_replicas': '0'
... }
... id = sa.Column(sa.String, primary_key=True, default=gen_key)
... content = sa.Column(sa.String)
... status = sa.Column(sa.String)
>>> class ArchivedTasks(Base):
... __tablename__ = 'archived_tasks'
... __table_args__ = {
... 'crate_number_of_replicas': '0'
... }
... id = sa.Column(sa.String, primary_key=True)
... content = sa.Column(sa.String)
>>> Base.metadata.create_all(bind=engine)
Let’s add a task to the Todo
table:
>>> task = Todos(content='Write Tests', status='done')
>>> session.add(task)
>>> session.commit()
>>> _ = connection.execute(sa.text("REFRESH TABLE todos"))
Now, let’s use insert().from_select()
to archive the task into the
ArchivedTasks
table:
>>> sel = select(Todos.id, Todos.content).where(Todos.status == "done")
>>> ins = insert(ArchivedTasks).from_select(['id', 'content'], sel)
>>> result = session.execute(ins)
>>> session.commit()
This will emit the following INSERT
statement to the database:
- INSERT INTO archived_tasks (id, content)
(SELECT todos.id, todos.content FROM todos WHERE todos.status = ‘done’)
Now, verify that the data is present in the database:
>>> _ = connection.execute(sa.text("REFRESH TABLE archived_tasks"))
>>> pprint([str(r) for r in session.execute(sa.text("SELECT content FROM archived_tasks"))])
["('Write Tests',)"]
INSERT...RETURNING
¶
The RETURNING
clause can be used to retrieve the result rows of an INSERT
operation. It may be specified using the Insert.returning()
method.
The first step is to define the table:
>>> from sqlalchemy import insert
>>> class User(Base):
... __tablename__ = 'user'
... __table_args__ = {
... 'crate_number_of_replicas': '0'
... }
... id = sa.Column(sa.String, primary_key=True, default=gen_key)
... username = sa.Column(sa.String)
... email = sa.Column(sa.String)
>>> Base.metadata.create_all(bind=engine)
Now, let’s use the returning clause on our insert to retrieve the values inserted:
>>> stmt = insert(User).values(username='Crate', email='crate@crate.io').returning(User.username, User.email)
>>> result = session.execute(stmt)
>>> session.commit()
>>> print([str(r) for r in result])
["('Crate', 'crate@crate.io')"]
The following INSERT...RETURNING
statement was issued to the database:
INSERT INTO user (id, username, email)
VALUES (:id, :username, :email)
RETURNING user.id, user.username, user.email
UPDATE...RETURNING
The RETURNING
clause can also be used with an UPDATE
operation to return
specified rows to be returned on execution. It can be specified using the
Update.returning()
method.
We can reuse the user table previously created in the INSERT...RETURNING
section.
Insert a user and get the user id:
>>> from sqlalchemy import insert, update
>>> stmt = insert(User).values(username='Arthur Dent', email='arthur_dent@crate.io').returning(User.id, User.username, User.email)
>>> result = session.execute(stmt)
>>> session.commit()
>>> uid = [r[0] for r in result][0]
Now let’s update the user:
>>> stmt = update(User).where(User.id == uid).values(username='Tricia McMillan', email='tricia_mcmillan@crate.io').returning(User.username, User.email)
>>> res = session.execute(stmt)
>>> session.commit()
>>> print([str(r) for r in res])
["('Tricia McMillan', 'tricia_mcmillan@crate.io')"]
The following UPDATE...RETURNING
statement was issued to the database:
UPDATE user SET username=:username, email=:email
WHERE user.id = :id_1
RETURNING user.username, user.email