Live Stream: Turbocharge your aggregations, search & AI models & get real-time insights

Register now
Skip to content
Data models > Time series

Optimizing Time Series Queries

Query optimization is fundamental for dealing with large-scale, time-series data, ensuring efficient data retrieval, storage, and improved overall performance.

Index Use for Query Performance

Indexes are crucial for enhancing query performance, especially when dealing with large datasets. The use of indexes can effectively expedite the retrieval of relevant records and avoid full-table scans. This becomes particularly beneficial when the query incorporates specific comparison operators or when sorting data, where an index significantly boosts performance over in-memory sorting.
 

Query Execution Plan Analysis

Efficient querying is vital when handling time-series data due to its large volume and temporal nature. CrateDB offers a command EXPLAIN ANALYZE, which helps optimize these queries. When applied to your SQL query, CrateDB generates an execution plan and captures detailed runtime statistics such as time spent in parsing, planning, and actual execution. This command can provide insights into the execution process, pinpointing potential bottlenecks and areas for optimization.
 
explain_analyze_query

Parallel Execution with Partitioning and Sharding

To optimize performance and efficiently manage large-scale time-series datasets, it's crucial to use partitioning and sharding. This approach optimizes performance of full table scans, which are CPU and I/O intensive. By partitioning and sharding data, not all data needs to be scanned, and aggregations can be executed in parallel across different shards and partitions. This approach significantly streamlines query execution, reduces time, and optimizes resource usage.
 weather_data_table

Arrays for Storage Optimization

In CrateDB, arrays are an effective solution for optimizing storage. They allow storing multiple values in a single structure and are particularly useful for handling large-scale time-series data. Arrays allow for efficient representation of data sets and can significantly reduce the storage footprint. They can store sequences of measurements, reducing the need for individual records for each data point. By restructuring data into arrays, applying compression algorithms, and turning off indexes on array fields, storage requirements can be significantly reduced. Depending on the particular data model, we see a reduction in disk space used by more than 80%.query_with_arrays

JOIN Optimization with Common Table Expressions

Common Table Expressions (CTEs) enhance both the performance and readability of queries. Particularly useful in complex scenarios needing many filtering and aggregation steps, or when a data subset is repeatedly used within one query, CTEs can offer considerable performance benefits over traditional JOIN operations. Each subsequent JOIN operation is refined to the smallest possible dataset, leading to faster query performance and efficient resource use.

Furthermore, CTEs enhance readability and maintainability of queries, and can retain intermediate results, reducing the number of scans over the original table.

query_with_cte

Want to read more?

Whitepaper" Guide for Time Series Data Projects