Primary key strategies and autogenerated sequences¶
Introduction
As you begin working with CrateDB, you might be puzzled why CrateDB does not have a built-in, auto-incrementing “serial” data type, like PostgreSQL or MySQL.
This page explains why that is and walks you through five common alternatives to generate unique primary key values in CrateDB, including a recipe to implement your own auto-incrementing sequence mechanism when needed.
Why auto-increment sequences don’t exist in CrateDB
In traditional RDBMS systems, auto-increment fields rely on a central counter. In a distributed system like CrateDB, maintaining a global auto-increment value would require that a node checks with other nodes before allocating a new value. This would create a global coordination bottleneck, limit insert throughput, and reduce scalability.
CrateDB is designed for horizontal scalability and high ingestion throughput.
To achieve this, operations must complete independently on each node—without
central coordination. This design choice means CrateDB does not support
traditional auto-incrementing primary key types like SERIAL in PostgreSQL
or MySQL.
Solutions
CrateDB provides flexibility: You can choose a primary key strategy tailored to your use case, whether for strict uniqueness, time ordering, or external system integration. You can also implement true consistent/synchronized sequences if you want to.
Using a timestamp as a primary key¶
This option involves declaring a column using DEFAULT now().
CREATE TABLE example (
id BIGINT DEFAULT now() PRIMARY KEY
);
- Pros:
Auto-generated, always-increasing value
Useful when records are timestamped anyway
- Cons:
Can result in gaps
Collisions possible if multiple records are created in the same millisecond
Using elasticflake identifiers¶
This option involves declaring a column using DEFAULT gen_random_text_uuid().
CREATE TABLE example2 (
id TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY
);
- Pros:
Universally unique
No conflicts when merging from multiple environments or sources
- Cons:
Not ordered
Harder to read/debug
No efficient range queries
Using UUIDv7 identifiers¶
UUIDv7 is a new format that preserves temporal ordering, making UUIDs better suited for inserts and range queries in distributed databases.
You can use UUIDv7 for CrateDB via a User-Defined Function (UDF) in JavaScript, or use a UUIDv7 library in your application layer.
- Pros:
Globally unique and almost sequential
Efficient range queries possible
- Cons:
Not as human-friendly as integer numbers
Slight overhead due to UDF use
Using IDs from external systems¶
If you are importing data from a source system that already generates unique IDs, you can reuse those by inserting primary key values as-is from the source system.
In this case, CrateDB does not need to generate any identifier values, and consistency is ensured across systems.
See also
An example for that is Replicating data from other databases to CrateDB with Debezium and Kafka.
Implementing a custom sequence table¶
If you must have an auto-incrementing numeric ID (e.g., for compatibility or legacy reasons), you can implement a simple sequence generator using a dedicated table and client-side logic.
This approach involves a table to keep the latest values that have been consumed and client side code to keep it up-to-date in a way that guarantees unique values even when many ingestion processes run in parallel.
- Pros:
Fully customizable (you can add prefixes, adjust increment size, etc.)
Sequential IDs possible
- Cons:
Additional client logic about optimistic updates is required for writing
The sequence table may become a bottleneck at very high ingestion rates
Step 1: Create a sequence tracking table¶
Create a table to keep the latest values for the sequences.
CREATE TABLE sequences (
name TEXT PRIMARY KEY,
last_value BIGINT
) CLUSTERED INTO 1 SHARDS;
Step 2: Initialize your sequence¶
Initialize the table with one new sequence at 0.
INSERT INTO sequences (name,last_value)
VALUES ('mysequence',0);
Step 3: Create a target table¶
Start an example with a newly defined table.
CREATE TABLE mytable (
id BIGINT PRIMARY KEY,
field1 TEXT
);
Step 4: Generate and use sequence values in Python¶
Use optimistic concurrency control to generate unique, incrementing values even in parallel ingestion scenarios.
The Python code below reads the last value used from the sequences table, and
then attempts an optimistic UPDATE with a RETURNING clause. If a
contending process already consumed the identity nothing will be returned so our
process will retry until a value is returned. Then it uses that value as the new
ID for the record we are inserting into the mytable table.
# Requires: records, sqlalchemy-cratedb
#
# /// script
# requires-python = ">=3.8"
# dependencies = [
# "records",
# "sqlalchemy-cratedb",
# ]
# ///
import time
import records
db = records.Database("crate://")
sequence_name = "mysequence"
max_retries = 5
base_delay = 0.1 # 100 milliseconds
for attempt in range(max_retries):
select_query = """
SELECT last_value, _seq_no, _primary_term
FROM sequences
WHERE name = :sequence_name;
"""
row = db.query(select_query, sequence_name=sequence_name).first()
new_value = row.last_value + 1
update_query = """
UPDATE sequences
SET last_value = :new_value
WHERE name = :sequence_name
AND _seq_no = :seq_no
AND _primary_term = :primary_term
RETURNING last_value;
"""
if (
str(
db.query(
update_query,
new_value=new_value,
sequence_name=sequence_name,
seq_no=row._seq_no,
primary_term=row._primary_term,
).all()
)
!= "[]"
):
break
delay = base_delay * (2**attempt)
print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f} seconds...")
time.sleep(delay)
else:
raise Exception(f"Failed after {max_retries} retries with exponential backoff")
insert_query = "INSERT INTO mytable (id, field1) VALUES (:id, :field1)"
db.query(insert_query, id=new_value, field1="abc")
db.close()
Summary¶
Strategy |
Ordered |
Unique |
Scalable |
Human-friendly |
Range queries |
Notes |
|---|---|---|---|---|---|---|
Timestamp |
✅ |
⚠️ |
✅ |
✅ |
✅ |
Potential collisions |
Elasticflake |
❌ |
✅ |
✅ |
❌ |
❌ |
Default UUIDs |
UUIDv7 |
✅ |
✅ |
✅ |
❌ |
✅ |
Requires UDF |
External system IDs |
✅/❌ |
✅ |
✅ |
✅ |
✅ |
Depends on source |
Sequence table |
✅ |
✅ |
⚠️ |
✅ |
✅ |
Manual retry logic |