As a distributed database system with support for the standard SQL query language, CrateDB is great to run aggregations server-side, working on huge datasets, and getting summarized results back; there are however cases where we may still want to retrieve lots of data from CrateDB.
Getting a large number of results in a single operation can put stress on the network and other resources, both client and server-side, so we need a mechanism to fetch results in manageable batches as we are ready to process them.
A common requirement is also what is called pagination, which is to present results to users with pages with a set number of results each, allowing them to move between these pages. In this case, it is common that many users will only look at the first few pages of results, so we want to implement that in the most efficient way possible.
This can be done using cursors, a cursor is like having a bookmark pointing to a specific record in the result set of a query.
In CrateDB we were already able to use cursors at the protocol level, but with version 5.1.0 we have introduced the ability to do it with standard SQL commands.
First, we issue this command:
BEGIN;
This is a SQL command that would normally start a transaction, there are no transactions as such in CrateDB, but this will create a scope on which cursors can be created.
DECLARE observations_cursor NO SCROLL CURSOR FOR
SELECT ts,device,reading
FROM doc.observations
WHERE ts BETWEEN '2022-01-01 00:00' AND '2022-10-01 00:00';
This associates a cursor name with a query and determines the point in time at which data is “frozen”, this is because cursors in CrateDB are INSENSITIVE
, meaning that the client can take all the time it needs to retrieve the results and the data will always reflect the status of the tables as it was at the time the cursor was declared, ignoring any records that are updated, deleted, or newly inserted.
FETCH 10 FROM observations_cursor;
This retrieves 10 rows from the query, and when issued again it will retrieve the next 10 rows and so on. We can retrieve a different number of records each time and we know we have reached the end of the result set when FETCH
returns zero rows.
Once the cursor is not needed anymore it can be closed with either END
, CLOSE
, or COMMIT
.
Take a look at this short animation showing an example of how this works: Mathias Fußenegger on Twitter
We hope you find this useful, and we will be happy to hear about your experience in the Community.