In brief: CrateDB 4.1 adds support for more SQL features that support time series use-cases, including interval-based data management. We also improved PostgreSQL wire protocol compatibility with the addition of string scalars.
The intersection of time series and big data
Many Internet of Things (IoT) use cases deal with massive amounts of machine data that is best modeled as time series data. That time series data must then be processed (filtered, enriched, etc.), recorded, queried, and analyzed at scale.
Traditional relational databases struggle with substantial time series workloads and are difficult and expensive to scale. And most time series databases are easy to scale but ditch SQL in favor of a proprietary query language.
We are building CrateDB to provide you with a database that is well suited for time series data, scales up easily for big data, and uses standard SQL.
Specifically:
-
CrateDB is a masterless distributed database that is trivial to scale both horizontally and vertically.
-
A shared-nothing architecture means CrateDB can be containerized and deployed using tools like Docker or Kubernetes (on-premises or in the cloud).
-
The CrateDB distributed query execution engine uses parallel computing to leverage the processing power of your whole database cluster to maximize query performance.
-
We are always adding new SQL features so that you can get more out of your time series data and use more third-party tools.
What’s in the release?
Some highlights:
-
Improved window functions
Windows now support the lead and lag functions, which allow you to fill in gaps when working with time series data. We also added named windows and row-based frame definitions for more flexibility when defining windows.
-
Interval type and timezone improvements
Interval based sensor readings are often mixed with sensor readings that have been triggered by events. To help you normalize this kind of data, we added the interval type, which you can use with the generate_series function. Additionally, we improved timezone support.
-
Improved PostgreSQL compatibility with string scalars
PostgreSQL tools often make heavy use of string operations. To improve compatibility with those tools, we added support for padding, trimming, and lots of other string scalars.
-
Improved speed for
SELECT DISTINCT
withLIMIT
A lot of people use
SELECT DISTINCT
when retrieving entity records (e.g., users). In 4.1,SELECT DISTINCT
queries with aLIMIT
clause will use less memory and can execute up to 200% times faster.
For more in-depth information, including breaking changes, check out the 4.1 release notes.