Download the latest version of the CrateDB Architecture Guide

Download Now
Skip to content
Blog

High-Cardinality Database for Time Series Analytics: Why Dimensions Matter and Where CrateDB Excels

Time series analytics has changed dramatically. What used to be simple metric tracking has evolved into event-driven analytics at massive scale, where every data point carries rich context: device IDs, users, tenants, locations, versions, features, and more.

This shift has exposed a hard truth: cardinality and dimensionality, not raw data volume, are the real challenges of modern time series analytics.

A high cardinality database is designed to handle this reality. In this article, we explore what high-cardinality time series really means, why many systems struggle with it, and how CrateDB is purpose-built to handle high and continuously growing cardinality using SQL, without pre-aggregation or architectural complexity.

What Is a High Cardinality Database?

A high cardinality database is designed to efficiently store and query data with millions or billions of unique values across one or more dimensions.

Low-cardinality dimensions include values such as:

  • status = OK, WARN, ERROR
  • region = EU, US, APAC

High-cardinality dimensions include:

  • device_id
  • user_id
  • session_id
  • tenant_id
  • vehicle_id

In real-world systems, cardinality is not static. It grows continuously as new users, devices, tenants, and events are added. A true high cardinality database maintains predictable query performance even as this growth accelerates.

high cardinality database explained

Why High Cardinality Breaks Traditional Time Series Databases

Traditional time series databases were designed for scenarios like infrastructure monitoring:

  • A small, fixed set of metrics
  • A limited number of dimensions
  • Predictable queries defined in advance

Modern applications look very different.  Today’s time series data often represents events, not just metrics:

  • IoT sensor readings enriched with device metadata
  • Application events tagged with user, tenant, and feature flags
  • Operational telemetry tied to assets, locations, and versions
  • Usage analytics where every customer interaction adds new dimensions

Each new dimension multiplies the number of unique combinations that must be indexed, filtered, and aggregated. This is where cardinality explodes.

Why "Unlimited Cardinality" Is the Wrong Promise

Some platforms claim "unlimited cardinality". In practice, this is neither accurate nor helpful.

Every system has physical limits, and experienced engineers know this. Over-promising here reduces credibility.

That's why we say CrateDB is designed for very high and continuously growing cardinality, without requiring pre-aggregation, rigid schemas, or query redesign.

What matters is not whether cardinality is theoretically unlimited, but whether:

  • New dimensions can be added freely
  • Cardinality can grow without operational pain
  • Query performance remains predictable
  • Analytics remain flexible as questions evolve

This is where CrateDB differentiates itself.

Dimensions Are the Hard Part of Time Series Analytics

Many time series systems are optimized for:

  • Few metrics
  • Few dimensions
  • Known queries in advance

This works well for infrastructure monitoring, but fails for:

  • IoT and sensor platforms
  • Customer-facing analytics
  • Usage analytics
  • Operational intelligence
  • Multi-tenant SaaS analytics

In these use cases:

  • Queries are ad hoc
  • Dimensions are dynamic
  • New filters appear constantly
  • Aggregations are exploratory

CrateDB treats time series as fully dimensional data, not as a specialized metrics format.

CrateDB’s Unique Strengths for High-Cardinality Time Series

1. SQL-First Analytics on Rich Dimensions

CrateDB exposes time series data through standard SQL, allowing:

  • GROUP BY on high-cardinality columns
  • Multi-dimensional filtering
  • Complex aggregations
  • Joins with reference data
  • Queries across structured and semi-structured fields

Dimensions can be strings, numbers, geo types, or nested JSON attributes. All are first-class query citizens.

There is no custom query language and no need to reshape data to fit a metrics model.

2. No Pre-Aggregation or Rollup Pipelines

Many systems rely on:

  • Downsampling
  • Rollup tables
  • Predefined metrics
  • Batch aggregation pipelines

These approaches reduce flexibility and increase operational complexity. CrateDB ingests raw events and makes them queryable in near real time, enabling:

  • New questions without reprocessing data
  • Dashboard changes without rebuilding pipelines
  • Multiple analytical views on the same dataset
  • High cardinality becomes a design feature, not a limitation.

3. High Ingest Rates with Instant Queryability

High-cardinality systems are useless if fresh data is slow to analyze.

CrateDB is designed for:

New data is typically available for analytics within milliseconds, even at scale. This makes it suitable for real-time dashboards, alerts, and operational decision-making.

4. Distributed Architecture That Scales with Cardinality

High cardinality stresses:

  • Index sizes
  • Memory usage
  • Query planning

CrateDB distributes data, indexes, and query execution across nodes automatically. Scaling cardinality follows the same model as scaling volume: add nodes, not complexity. No manual sharding strategies. No application-side routing.

5. Built for Multi-Tenant and Customer-Facing Analytics

High-cardinality dimensions like tenant_id or customer_id are common in SaaS and platform use cases.

CrateDB supports:

  • Tenant-based schemas
  • Tenant identifiers as query dimensions
  • Logical isolation on shared infrastructure

This enables customer-facing analytics where each new tenant adds both data volume and cardinality without breaking the system.

Real-World Use Cases That Depend on High Cardinality

CrateDB is particularly strong in scenarios where dimensions grow continuously and queries are unpredictable:

  • IoT and industrial telemetry
  • Fleet and asset tracking
  • Real-time operational dashboards
  • Usage and behavior analytics
  • Manufacturing and sensor platforms
  • Event-driven applications

In all of these cases, questions evolve faster than schemas.

Conclusion

High-cardinality time series analytics is no longer a niche requirement. It is the norm for modern, data-driven platforms.

CrateDB was built for this reality: rich dimensions, continuous growth, real-time queries, and full analytical flexibility using SQL.

If your analytics workload is constrained not by how much data you store, but by how many questions you can ask, CrateDB offers a fundamentally different approach.

Read more about CrateDB for time series data >