High Cardinality & High Dimensionality Analytics

Why It's Hard

High cardinality introduces millions of unique values per dimension.
High dimensionality adds dozens or hundreds of attributes per event.

Together, they create combinatorial pressure that breaks traditional databases through index explosion, unstable query planning, and rising infrastructure costs. Teams are forced to downsample, pre-aggregate, or discard data just to keep systems running.

Why CrateDB is different

When high cardinality and high dimensionality combine, many systems hit a performance cliff.

CrateDB’s columnar approach stores and scans each dimension independently, so the cardinality of one attribute does not inflate memory usage or index structures for others. New dimensions can be added without destabilizing ingestion or query performance.

This enables:

Granular analytics across millions of users, devices, or assets
Fast root-cause and anomaly analysis at full data resolution
Ad hoc analytics without pre-aggregation or schema redesigns
Analytics-ready data for real-time dashboards and AI pipelines

Distributed, columnar architecture

CrateDB uses a shared-nothing, distributed design with columnar storage. Data is automatically distributed and queried in parallel, keeping performance stable as cardinality grows.

Automatic indexing

High-cardinality columns and frequently queried dimensions are indexed automatically at ingestion time. No index planning, no reindexing jobs, and no manual tuning as data evolves.

SQL across all dimensions

Query any combination of dimensions using standard SQL:

Filter by millions of unique values
Group by high-cardinality attributes
Combine structured and semi-structured data
Run aggregations, joins, and search in one query

Execution plans adapt automatically as query patterns change.

Schema flexibility for evolving data

Add new dimensions instantly using dynamic columns and JSON support. Evolving data models do not require schema migrations or downtime.

Learn more about CrateDB

Log & event analytics

Bitmovin is a leading video streaming company. They use CrateDB to store 140 terabytes of storage, both user events and user interactions. Every day, there is one billion of new lines of data, with the largest tables containing around 60 billion playback events.

"It is through the use of CrateDB that we are able to offer our large-scale video analytics component in the first place. Comparable products are either not capable of handling the large flood of data or they are simply too expensive."

Daniel Hölbling-Inzko
Senior Director of Engineering - Analytics
Bitmovin

High cardinality refers to dimensions with a very large number of unique values, such as device IDs, user IDs, transaction IDs, or session identifiers. High-cardinality data enables granular analytics but often causes performance and cost issues in traditional databases.

High dimensionality describes datasets with many attributes or columns per event. This includes wide tables, dynamic tags, and semi-structured data such as JSON. High dimensionality allows flexible analysis but can make schema management and query planning difficult in rigid systems.

When many high-cardinality dimensions are combined, traditional databases suffer from combinatorial explosion. Index sizes grow rapidly, ingestion slows, and query planning becomes unstable. As a result, teams are forced to pre-aggregate or drop data.

CrateDB is designed to absorb both dimensions without hitting these performance cliffs.

CrateDB uses a distributed, columnar architecture with automatic indexing. Data is stored and queried in parallel, and indexing adapts automatically to high-cardinality dimensions without manual tuning.

This keeps query latency predictable even as uniqueness grows into the billions.

Yes. CrateDB supports full SQL for real-time analytics. You can filter, group, aggregate, and join across millions of unique values using standard SQL, without pre-aggregation or query rewrites.

No. CrateDB tolerates high cardinality natively, allowing you to retain raw, full-resolution data for longer. You can choose to downsample or aggregate later for cost or retention reasons, not because the database cannot handle the data.

CrateDB supports dynamic columns and JSON data types. New dimensions can be added instantly without schema migrations or downtime, making it well suited for fast-changing data models.

Yes. CrateDB is commonly used for high-cardinality time series and event data, including metrics, logs, and sensor data. It supports time-based partitioning, fast ingestion, and sub-second queries across recent and historical data.

Many traditional TSDBs struggle with high cardinality and multiple dimensions, relying on rollups or aggressive data retention policies. CrateDB avoids these limitations by combining columnar storage, distributed execution, and SQL analytics, enabling flexible queries at scale.

CrateDB is well suited for:

IoT and sensor analytics
SaaS analytics backends
Observability platforms
Industrial and manufacturing analytics

Any workload requiring granular, real-time analytics across many dimensions benefits from this capability.

High Cardinality and High Dimensionality

Why It's Hard

Why CrateDB is different

Distributed, columnar architecture

Automatic indexing

SQL across all dimensions

Schema flexibility for evolving data

Additional resources

Blog

High-Cardinality Database for Time Series Analytics: Why Dimensions Matter and Where CrateDB Excels

User story

FAQ

101 for Time-Series databases

Company

Ecosystem

Contact