High Cardinality Database for Real-Time Analytics

Written by CrateDB | 2026-01-21

Time series analytics has changed dramatically. What used to be simple metric tracking has evolved into event-driven analytics at massive scale, where every data point carries rich context: device IDs, users, tenants, locations, versions, features, and more.

This shift has exposed a hard truth: cardinality and dimensionality, not raw data volume, are the real challenges of modern time series analytics.

A high cardinality database is designed to handle this reality. In this article, we explore what high-cardinality time series really means, why many systems struggle with it, and how CrateDB is purpose-built to handle high and continuously growing cardinality using SQL, without pre-aggregation or architectural complexity.

What Is a High Cardinality Database?

A high cardinality database is designed to efficiently store and query data with millions or billions of unique values across one or more dimensions.

Low-cardinality dimensions include values such as:

status = OK, WARN, ERROR
region = EU, US, APAC

High-cardinality dimensions include:

device_id
user_id
session_id
tenant_id
vehicle_id

In real-world systems, cardinality is not static. It grows continuously as new users, devices, tenants, and events are added. A true high cardinality database maintains predictable query performance even as this growth accelerates.

Why High Cardinality Breaks Traditional Time Series Databases

Traditional time series databases were designed for scenarios like infrastructure monitoring:

A small, fixed set of metrics
A limited number of dimensions
Predictable queries defined in advance

Modern applications look very different. Today’s time series data often represents events, not just metrics:

IoT sensor readings enriched with device metadata
Application events tagged with user, tenant, and feature flags
Operational telemetry tied to assets, locations, and versions
Usage analytics where every customer interaction adds new dimensions

Each new dimension multiplies the number of unique combinations that must be indexed, filtered, and aggregated. This is where cardinality explodes.

Why "Unlimited Cardinality" Is the Wrong Promise

Some platforms claim "unlimited cardinality". In practice, this is neither accurate nor helpful.

Every system has physical limits, and experienced engineers know this. Over-promising here reduces credibility.

That's why we say CrateDB is designed for very high and continuously growing cardinality, without requiring pre-aggregation, rigid schemas, or query redesign.

What matters is not whether cardinality is theoretically unlimited, but whether:

New dimensions can be added freely
Cardinality can grow without operational pain
Query performance remains predictable
Analytics remain flexible as questions evolve

This is where CrateDB differentiates itself.

Dimensions Are the Hard Part of Time Series Analytics

Many time series systems are optimized for:

Few metrics
Few dimensions
Known queries in advance

This works well for infrastructure monitoring, but fails for:

IoT and sensor platforms
Customer-facing analytics
Usage analytics
Operational intelligence
Multi-tenant SaaS analytics

In these use cases:

Queries are ad hoc
Dimensions are dynamic
New filters appear constantly
Aggregations are exploratory

CrateDB treats time series as fully dimensional data, not as a specialized metrics format.

CrateDB’s Unique Strengths for High-Cardinality Time Series

1. SQL-First Analytics on Rich Dimensions

CrateDB exposes time series data through standard SQL, allowing:

GROUP BY on high-cardinality columns
Multi-dimensional filtering
Complex aggregations
Joins with reference data
Queries across structured and semi-structured fields

Dimensions can be strings, numbers, geo types, or nested JSON attributes. All are first-class query citizens.

There is no custom query language and no need to reshape data to fit a metrics model.

2. No Pre-Aggregation or Rollup Pipelines

Many systems rely on:

Downsampling
Rollup tables
Predefined metrics
Batch aggregation pipelines

These approaches reduce flexibility and increase operational complexity. CrateDB ingests raw events and makes them queryable in near real time, enabling:

New questions without reprocessing data
Dashboard changes without rebuilding pipelines
Multiple analytical views on the same dataset
High cardinality becomes a design feature, not a limitation.

3. High Ingest Rates with Instant Queryability

High-cardinality systems are useless if fresh data is slow to analyze.

CrateDB is designed for:

New data is typically available for analytics within milliseconds, even at scale. This makes it suitable for real-time dashboards, alerts, and operational decision-making.

4. Distributed Architecture That Scales with Cardinality

High cardinality stresses:

Index sizes
Memory usage
Query planning

CrateDB distributes data, indexes, and query execution across nodes automatically. Scaling cardinality follows the same model as scaling volume: add nodes, not complexity. No manual sharding strategies. No application-side routing.

5. Built for Multi-Tenant and Customer-Facing Analytics

High-cardinality dimensions like tenant_id or customer_id are common in SaaS and platform use cases.

CrateDB supports:

Tenant-based schemas
Tenant identifiers as query dimensions
Logical isolation on shared infrastructure

This enables customer-facing analytics where each new tenant adds both data volume and cardinality without breaking the system.

Real-World Use Cases That Depend on High Cardinality

CrateDB is particularly strong in scenarios where dimensions grow continuously and queries are unpredictable:

IoT and industrial telemetry
Fleet and asset tracking
Real-time operational dashboards
Usage and behavior analytics
Manufacturing and sensor platforms
Event-driven applications

In all of these cases, questions evolve faster than schemas.

Conclusion

High-cardinality time series analytics is no longer a niche requirement. It is the norm for modern, data-driven platforms.

CrateDB was built for this reality: rich dimensions, continuous growth, real-time queries, and full analytical flexibility using SQL.

If your analytics workload is constrained not by how much data you store, but by how many questions you can ask, CrateDB offers a fundamentally different approach.

Read more about CrateDB for time series data >

View full post