High Cardinality and High Dimensionality
Modern analytics workloads operate at extreme granularity. Data is analyzed across millions of users, devices, sensors, transactions, and dynamic attributes. As uniqueness and dimensionality grow, many systems slow down or force early aggregation.
CrateDB is built to handle high cardinality and high dimensionality natively, so analytics remain fast, flexible, and predictable as data complexity increases.
Why It's Hard
High cardinality introduces millions of unique values per dimension.
High dimensionality adds dozens or hundreds of attributes per event.
Together, they create combinatorial pressure that breaks traditional databases through index explosion, unstable query planning, and rising infrastructure costs. Teams are forced to downsample, pre-aggregate, or discard data just to keep systems running.
Why CrateDB is different
When high cardinality and high dimensionality combine, many systems hit a performance cliff.
CrateDB’s columnar approach stores and scans each dimension independently, so the cardinality of one attribute does not inflate memory usage or index structures for others. New dimensions can be added without destabilizing ingestion or query performance.
This enables:
-
Granular analytics across millions of users, devices, or assets
-
Fast root-cause and anomaly analysis at full data resolution
-
Ad hoc analytics without pre-aggregation or schema redesigns
-
Analytics-ready data for real-time dashboards and AI pipelines
Distributed, columnar architecture
CrateDB uses a shared-nothing, distributed design with columnar storage. Data is automatically distributed and queried in parallel, keeping performance stable as cardinality grows.
Automatic indexing
High-cardinality columns and frequently queried dimensions are indexed automatically at ingestion time. No index planning, no reindexing jobs, and no manual tuning as data evolves.
SQL across all dimensions
Query any combination of dimensions using standard SQL:
-
Filter by millions of unique values
-
Group by high-cardinality attributes
-
Combine structured and semi-structured data
-
Run aggregations, joins, and search in one query
Execution plans adapt automatically as query patterns change.
Schema flexibility for evolving data
Add new dimensions instantly using dynamic columns and JSON support. Evolving data models do not require schema migrations or downtime.
Additional resources
User story
"It is through the use of CrateDB that we are able to offer our large-scale video analytics component in the first place. Comparable products are either not capable of handling the large flood of data or they are simply too expensive."
Daniel Hölbling-Inzko
Senior Director of Engineering - Analytics
Bitmovin
FAQ
High cardinality refers to dimensions with a very large number of unique values, such as device IDs, user IDs, transaction IDs, or session identifiers. High-cardinality data enables granular analytics but often causes performance and cost issues in traditional databases.
High dimensionality describes datasets with many attributes or columns per event. This includes wide tables, dynamic tags, and semi-structured data such as JSON. High dimensionality allows flexible analysis but can make schema management and query planning difficult in rigid systems.
CrateDB is designed to absorb both dimensions without hitting these performance cliffs.
This keeps query latency predictable even as uniqueness grows into the billions.
No. CrateDB tolerates high cardinality natively, allowing you to retain raw, full-resolution data for longer. You can choose to downsample or aggregate later for cost or retention reasons, not because the database cannot handle the data.
CrateDB supports dynamic columns and JSON data types. New dimensions can be added instantly without schema migrations or downtime, making it well suited for fast-changing data models.
Yes. CrateDB is commonly used for high-cardinality time series and event data, including metrics, logs, and sensor data. It supports time-based partitioning, fast ingestion, and sub-second queries across recent and historical data.
Many traditional TSDBs struggle with high cardinality and multiple dimensions, relying on rollups or aggressive data retention policies. CrateDB avoids these limitations by combining columnar storage, distributed execution, and SQL analytics, enabling flexible queries at scale.
CrateDB is well suited for:
-
IoT and sensor analytics
-
SaaS analytics backends
-
Observability platforms
-
Digital and e-commerce analytics
-
Industrial and manufacturing analytics
Any workload requiring granular, real-time analytics across many dimensions benefits from this capability.
101 for Time-Series databases
- What is a time series database?
- Key criteria for selecting a time series database.