Time series data refers to a collection of data points that are ordered and arranged in a sequential manner, based on time. This type of data is characterized by its unique features, such as the inherent sequential nature of the data points and the presence of various patterns, including trends, seasonality, and irregularities.
Time series data is ubiquitous and spreads across a multitude of industries. For example, in the manufacturing sector, time series data is utilized for monitoring and optimizing production processes, forecasting equipment maintenance, and analyzing supply chain trends. In transportation and logistics, it is used to monitor traffic patterns and optimize delivery routes of managed fleets; intralogistics offers a lot of potential for optimization. In the energy sector, time series data allows for the prediction of energy demand and optimization of energy production. In monitoring and security, metrics, events, and logs can be collected and analyzed in real-time to correlate them to each other and automatically detect failures or intrusions. Regardless of the industry, time series data plays a crucial role in enabling businesses to make data-driven decisions, highlighting its importance in today's data-centric world.
What do all of these use cases have in common when it comes to data management?
Time series data is often more complex than just a collection of timestamps, tags, and values. It can come in various formats, including tabular or JSON, often includes textual data, and schemas might change without prior notification.
Users oftentimes request a very long retention period to analyze patterns over time. To properly interpret these data points, contextual information is required which is usually not time series data. This is mainly metadata and master data of sensors or assets, enterprise data about customers or production orders, as well as external data, such as weather forecasts. In typical applications, there is about 90% time series data and 10% contextual information.
There are many use cases for time series data that go beyond just querying and visualizing the data. These include applying statistics and machine learning for anomaly detection, forecasting, and predictions.
A rich user experience is necessary for these applications. This includes performant ad-hoc queries that join and correlate time series with each other as well as with their contextual data, full-text search capabilities like fuzzy search and embeddings, and similarity search to power modern natural language applications, like chatbots.
We want to demonstrate how to build time series solutions with CrateDB. This includes working with various types of data in addition to traditional time series data, which is very important to provide the full context. These additional data can be for example, master data from other systems or logs.
The ingest can be done via a variety of methods, like batch uploading or streaming data in real-time. Once the data is loaded, you'll need to model it. CrateDB allows you to create tables and schemas that are ideal for time series data. When it comes to storing, querying, and analyzing data, CrateDB is a very good fit as it is designed as a distributed database built for handling large volumes of granular data in multiple different formats: tables, time series, JSON-based documents, geospatial, full-text, and vector embeddings.
With CrateDB, you can perform complex queries and aggregations in real-time, which is crucial for time series analysis.
We will show in the next videos how to visualize data and how to efficiently train machine learning models. These can be used, for example, to detect anomalies or forecast time series into the future.