Independent Time Series Benchmark Confirms CrateDB’s Top-Tier Performance

Read now
Skip to content
Blog

Independent Time Series Benchmark Confirms CrateDB’s Top-Tier Performance

CrateDB continues to deliver impressive results in the latest TSBS benchmark conducted by Nyrkio. Compared to MongoDB and InfluxDB, CrateDB excels in both ingestion capabilities and complex ad hoc query execution.

Methodology used

Independent performance testing company

When evaluating database performance, standardized benchmarks provide valuable insights into product capabilities and help make informed decisions about which database best suits specific requirements. However, when benchmarks are conducted by the database vendor, they often lead to public debates regarding the comparability of the results and the validity of the chosen setup, especially for competing products, with which the database vendor may lack direct experience. To ensure objectivity, we decided to take a different approach and commissioned a set of benchmarks from an impartial party. We selected Nyrkio, a performance testing company founded by database experts, Henrik Ingo and Matt Fleming, who have held various engineering roles at MySQL, MariaDB, MongoDB, Cassandra, and PostgreSQL. We sponsored the infrastructure costs and time, agreed on the scope of work, and retained the right to publish or withhold the final report based on the findings. We chose to publish.

TSBS benchmark

For our benchmark choice, we decided to use the TSBS (Time Series Benchmark Suite), originally developed by TimescaleDB and still widely supported by the database community. TSBS is specifically designed to test time series databases and simulates real-world time series workloads, ensuring that the benchmarks are relevant to practical use cases and provide accurate insights into database performance.

Data workload scenario

We use the TSBS "Devops" dataset, which simulates a datacenter with 4,000 different processes (databases, disk subsystems, compute instances, etc.), each sending a record of their monitoring data every 10 seconds. 

The total size of the dataset is 1 billion rows, stored in a 31GB gzip file (278GB uncompressed).

Comparison against MongoDB and InfluxDB

CrateDB is a distributed SQL database purpose-built for real-time analytics on large datasets containing structured, semi-structured, and unstructured data. It leverages the Lucene search engine to automatically index all data and stores it in a columnar format, enabling hyper-fast aggregations. CrateDB supports a wide range of data types, from time series to nested JSON documents, and provides full-text, vector, and geospatial search capabilities.

With this in mind, we decided to narrow our scope to benchmarking against MongoDB and InfluxDB:

  • Market Dominance: Both MongoDB and InfluxDB are widely used and well-established databases - MongoDB as the leading document database and InfluxDB as the top time-series database. Their popularity ensures that the comparison is relevant to a broad audience.
  • Similar Use Cases: MongoDB and Influx are often used for similar purposes as  CrateDB, such as time-series data, IoT data, and analytics. This makes the comparison more meaningful and applicable to real-world scenarios.
  • Diverse Features: While MongoDB and Influx have different strengths and weaknesses, both offer a wide range of features that can be directly compared to CrateDB, providing a comprehensive evaluation of CrateDB's capabilities.
  • Objectivity: One of Nyrkiö's founders was a long-time engineer at MongoDB, known in the community for developing an internal testing framework and advocating for numerous performance improvements. This expertise eliminates potential claims that the setup was incorrectly configured, at least for MongoDB.

It is worth noting that we deliberately chose to benchmark against InfluxDB version 2.0. While the Influx team has publicly cited performance issues as a reason for embarking on a third complete rewrite of the database in a new programming language, at the time of writing, Influx 3.0 was not yet available as a downloadable binary. Moreover, it takes time for database technology to mature, and selecting a brand-new version for mission-critical production workloads is rarely advisable. Here, CrateDB offers a significant advantage, having been a proven and reliable technology for over a decade.

Time series benchmark results

CrateDB vs MongoDB 

Source: benchmark report

Ingest performance: Benchmark results show that CrateDB significantly outperforms MongoDB in write-heavy time series workloads, with MongoDB's write speed lagging by a factor of 20x. This performance gap is primarily due to MongoDB's use of a B-Tree structure, which is not optimized for write-intensive operations.

   CrateDB 5.7.1 MongoDB 7.0.11 (timeseries, 
 index: time)
MongoDB 7.0.11
(timeseries,
index: hostname, time)
Load

2M metrics/s
180k rows/s  
22.5k r/s/cpu 

110k metrics/s  <100k metrics/s 

 

Query performance for single-column grouping: Benchmark results show that CrateDB delivers strong performance by default, regardless of query type. In contrast, MongoDB requires indexing on all columns used in both the WHERE and GROUP BY clauses to achieve reasonable performance. While this may not pose an issue for simple queries, it highlights potential challenges with more complex queries.

   CrateDB 5.7.1 MongoDB 7.0.11 (timeseries, 
 index: time)
MongoDB 7.0.11
(timeseries,
index: hostname, time)
tsbs_single-groupby-1-1-1 

0.15 - 3.9 s 
(median -> max) 

11 - 106 s 
(median -> max) 
.004 - 0.05 s 
(median -> max)
tsbs_single-groupby-1-1-12 

0.18 - 1.4 s 

33 - 129 s  0.08 - 0.3 s 
tsbs_single-groupby-5-1-12 

0.19 - 1.3 s 

36- 52 s  0.08 - 0.2 s 
tsbs_single-groupby-5-8-1 

0.17 - 1.3 s 

4 - 36 s 0.07 - 0.56 

 

Query performance for two-column grouping: Benchmark results show that CrateDB performs extremely well for more advanced queries, significantly outperforming MongoDB. Even with an optimal indexing strategy, MongoDB fails to deliver reasonable performance in this scenario.

   CrateDB 5.7.1 MongoDB 7.0.11 (timeseries, 
 index: time)
MongoDB 7.0.11
(timeseries,
index: hostname, time)
tsbs_double-groupby-1 

20 - 28 s 

154 - 292 s  167 - 466 s
tsbs_double-groupby-5 

31 - 46 s 

203 - 506 s 219 - 560 s 
tsbs_double-groupby-all 

44 - 60 s 

266 - 549 s   

 

CrateDB vs InfluxDB

Source: benchmark report

Ingest performance: Benchmark results show that CrateDB significantly outperforms InfluxDB in write-heavy time-series workloads. Although InfluxDB was expected to excel in write performance due to its LSM-like storage engine, CrateDB surpassed it, demonstrating 50% faster data ingestion rates.

   CrateDB 5.7.1 InfluxDB 2.7.8
Load

2M metrics/s
180k rows/s  
22.5k r/s/cpu 

1.3M metrics/s 
117k rows/s
14.7k r/s/cpu  

Query performance for single-column grouping: Benchmark results show that both CrateDB and InfluxDB perform very well for this type of query, delivering real-time insights within one second. While CrateDB doesn't particularly stand out in this scenario, it remains a strong solution, especially for use cases that involve more complex queries, as demonstrated in the next section.

   CrateDB 5.7.1 InfluxDB 2.7.8
tsbs_single-groupby-1-1-1 

0.15 - 3.9 s 
(median -> max) 

0.003 - 0.025 s 
(median -> max)

tsbs_single-groupby-1-1-12 

0.18 - 1.4 s 

0.014 - 0.15 s 

tsbs_single-groupby-5-1-12 

0.19 - 1.3 s 

0.074 - 0.37 s 

tsbs_single-groupby-5-8-1 

0.17 - 1.3 s 

0.041 - 0.16 s 

Query performance for 2-column grouping: Benchmark results show that CrateDB performs similarly to InfluxDB when aggregating a few columns. However, CrateDB distinguishes itself when aggregating more than 5 columns. In a test involving aggregation over 10 columns (tsbs_double-groupby-all), InfluxDB stalled for minutes and eventually ran out of memory, while CrateDB consistently delivered results in under 60 seconds.

   CrateDB 5.7.1 InfluxDB 2.7.8
tsbs_double-groupby-1 

20 - 28 s 

5.3 - 6.6 s 

tsbs_double-groupby-5 

31 - 46 s 

25 - 33 s 

tsbs_double-groupby-all 

44 - 60 s 

OOM 
(54 - ?? s)

 

Conclusion

Overall, CrateDB stands out as a great competitor to established leaders in the time-series category, such as InfluxDB. Independent benchmark results affirm its viability as a high-performing, flexible database solution for time series data and analytical workloads.

CrateDB's innovative indexing approach automatically indexes all columns, including nested structures, during data ingestion. This strategy ensures exceptional flexibility and supports ad-hoc analysis capabilities. As shown in the report, query of any complexity return results in roughly the same amount of time, significantly enhancing the developer experience by eliminating the need to plan indexing strategies.
 
Additionally, CrateDB provides a PostgreSQL wire protocol-compatible SQL interface, facilitating integration with a wide-range of third-party tools and avoiding the need to learn proprietary query languages like MQL and Flux. Its fully distributed SQL query engine, built on top of Apache Lucene, supports full-text, vector, and hybrid search, without requiring a separate vector database and incurring the costs associated with moving data between systems.

We invite you to experience CrateDB by deploying a free cloud cluster with 8GB of storage.