Primary key strategies and autogenerated sequences¶

Introduction

As you begin working with CrateDB, you might be puzzled why CrateDB does not have a built-in, auto-incrementing “serial” data type, like PostgreSQL or MySQL.

This page explains why that is and walks you through five common alternatives to generate unique primary key values in CrateDB, including a recipe to implement your own auto-incrementing sequence mechanism when needed.

Why auto-increment sequences don’t exist in CrateDB

In traditional RDBMS systems, auto-increment fields rely on a central counter. In a distributed system like CrateDB, maintaining a global auto-increment value would require that a node checks with other nodes before allocating a new value. This would create a global coordination bottleneck, limit insert throughput, and reduce scalability.

CrateDB is designed for horizontal scalability and high ingestion throughput. To achieve this, operations must complete independently on each node—without central coordination. This design choice means CrateDB does not support traditional auto-incrementing primary key types like SERIAL in PostgreSQL or MySQL.

Solutions

CrateDB provides flexibility: You can choose a primary key strategy tailored to your use case, whether for strict uniqueness, time ordering, or external system integration. You can also implement true consistent/synchronized sequences if you want to.

Using a timestamp as a primary key¶

This option involves declaring a column using DEFAULT now().

CREATE TABLE example (
  id BIGINT DEFAULT now() PRIMARY KEY
);

Pros:

Auto-generated, always-increasing value
Useful when records are timestamped anyway

Cons:

Can result in gaps
Collisions possible if multiple records are created in the same millisecond

Using elasticflake identifiers¶

This option involves declaring a column using DEFAULT gen_random_text_uuid().

CREATE TABLE example2 (
  id TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY
);

Pros:

Universally unique
No conflicts when merging from multiple environments or sources

Cons:

Not ordered
Harder to read/debug
No efficient range queries

Using UUIDv7 identifiers¶

UUIDv7 is a new format that preserves temporal ordering, making UUIDs better suited for inserts and range queries in distributed databases.

You can use UUIDv7 for CrateDB via a User-Defined Function (UDF) in JavaScript, or use a UUIDv7 library in your application layer.

Pros:

Globally unique and almost sequential
Efficient range queries possible

Cons:

Not as human-friendly as integer numbers
Slight overhead due to UDF use

Using IDs from external systems¶

If you are importing data from a source system that already generates unique IDs, you can reuse those by inserting primary key values as-is from the source system.

In this case, CrateDB does not need to generate any identifier values, and consistency is ensured across systems.

Implementing a custom sequence table¶

If you must have an auto-incrementing numeric ID (e.g., for compatibility or legacy reasons), you can implement a simple sequence generator using a dedicated table and client-side logic.

This approach involves a table to keep the latest values that have been consumed and client side code to keep it up-to-date in a way that guarantees unique values even when many ingestion processes run in parallel.

Pros:

Fully customizable (you can add prefixes, adjust increment size, etc.)
Sequential IDs possible

Cons:

Additional client logic about optimistic updates is required for writing
The sequence table may become a bottleneck at very high ingestion rates

Step 1: Create a sequence tracking table¶

Create a table to keep the latest values for the sequences.

CREATE TABLE sequences (
  name TEXT PRIMARY KEY,
  last_value BIGINT
) CLUSTERED INTO 1 SHARDS;

Step 2: Initialize your sequence¶

Initialize the table with one new sequence at 0.

INSERT INTO sequences (name,last_value)
VALUES ('mysequence',0);

Step 3: Create a target table¶

Start an example with a newly defined table.

CREATE TABLE mytable (
  id BIGINT PRIMARY KEY,
  field1 TEXT
);

Step 4: Generate and use sequence values in Python¶

Use optimistic concurrency control to generate unique, incrementing values even in parallel ingestion scenarios.

The Python code below reads the last value used from the sequences table, and then attempts an optimistic UPDATE with a RETURNING clause. If a contending process already consumed the identity nothing will be returned so our process will retry until a value is returned. Then it uses that value as the new ID for the record we are inserting into the mytable table.

# Requires: records, sqlalchemy-cratedb
#
# /// script
# requires-python = ">=3.8"
# dependencies = [
#     "records",
#     "sqlalchemy-cratedb",
# ]
# ///

import time
import records

db = records.Database("crate://")
sequence_name = "mysequence"

max_retries = 5
base_delay = 0.1  # 100 milliseconds

for attempt in range(max_retries):
    select_query = """
        SELECT last_value, _seq_no, _primary_term
        FROM sequences
        WHERE name = :sequence_name;
    """
    row = db.query(select_query, sequence_name=sequence_name).first()
    new_value = row.last_value + 1

    update_query = """
        UPDATE sequences
        SET last_value = :new_value
        WHERE name = :sequence_name
        AND _seq_no = :seq_no
        AND _primary_term = :primary_term
        RETURNING last_value;
    """
    if (
        str(
            db.query(
                update_query,
                new_value=new_value,
                sequence_name=sequence_name,
                seq_no=row._seq_no,
                primary_term=row._primary_term,
            ).all()
        )
        != "[]"
    ):
        break

    delay = base_delay * (2**attempt)
    print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f} seconds...")
    time.sleep(delay)
else:
    raise Exception(f"Failed after {max_retries} retries with exponential backoff")

insert_query = "INSERT INTO mytable (id, field1) VALUES (:id, :field1)"
db.query(insert_query, id=new_value, field1="abc")
db.close()

Summary¶

Strategy	Ordered	Unique	Scalable	Human-friendly	Range queries	Notes
Timestamp	✅	⚠️	✅	✅	✅	Potential collisions
Elasticflake	❌	✅	✅	❌	❌	Default UUIDs
UUIDv7	✅	✅	✅	❌	✅	Requires UDF
External system IDs	✅/❌	✅	✅	✅	✅	Depends on source
Sequence table	✅	✅	⚠️	✅	✅	Manual retry logic