In real-life, time series data management can face several challenges referred to
as the four V’s: volume, velocity, variety, veracity.
Volume: Time series data is known for the vast amount of data it generates. For
tasks like root cause analysis or precise forecasting, it is crucial to maintain
granular data. Downsampling is not an option here, as it would result in the loss of
key details.
Velocity: The speed at which data is generated and processed is another
significant factor. High-performant queries and aggregations are necessary, and
we need to be able to ingest and analyze incoming data in real-time.
Variety: Time series data comes in several types and formats, both structured,
semi-structured and unstructured. This variety needs to be effectively handled by
the technology we use.
Veracity: Finally, the quality and reliability of the data are critical. Given the high
volume and velocity of time series data, ensuring data veracity can become
difficult.
These challenges can result in complex data architectures and technical debt,
which become increasingly hard to deal with. Managing diverse types of data in
separate data stores requires data integration and synchronization, adding to the
project's complexity.
CrateDB addresses these challenges by maintaining a single source of truth that
is kept updated in near real-time. It supports multiple types of data with a single
technology using native SQL and offers a dynamic schema. Its high performance,
scalability and availability ensure the system can handle high volume and velocity
of data, even years of historical data.
Issues
Growing complexity and technical debt over time.
- One data store per use case
- Data synchronization and duplication
- Difficult to scale multiple technologies up/down or out/in
Implications
People, time and money.
- People: multiple people and specialized skillsets required
- Time: slow time to value and changes take long time.
- Money: high TCO and cost of change
Solutions
Single source of truth kept updated in near real-time.
- Native support of multiple data types on one technology
- Dynamic data schema for rapid evolution
- High-performance, availability and scalability
Beyond its capability of storing tabular and time series data, CrateDB also
supports JSON payloads, full-text search, vector search and similarity search,
as well as geospatial data handling. All these data types can be combined in a
single database record if needed and be easily accessed via standard SQL,
making CrateDB a robust and versatile solution for time series data management.
The dynamic schema and user-defined functions provide the flexibility for
efficient application development and maintenance. The columnar storage and
advanced indexing techniques enable complex queries and aggregations, with
no need to define all indexes manually. The distributed architecture nature
provides high availability and scalability, with a query engine capable of handling
massive high-volume concurrent reads and writes.
CrateDB can be deployed on the Edge, in your data centers, as well as in cloud
environments. It is also available as a fully managed service. A synchronization
mechanism, called Logical Replication, helps centralize relevant information for
holistic analysis.