Advanced Time Series Analysis¶
Learn how to conduct advanced data analysis on large time series datasets with CrateDB.
Anomaly detection Forecasting / Prediction Time series decomposition Exploratory data analysis Metadata integration
Anomaly Detection and Forecasting¶
To gain insights from your data in a one-shot or recurring way, based on machine learning techniques, you may want to look into applying anomaly detection and/or forecasting methods.
Examples
Use MLflow for time series anomaly detection and time series forecasting
Guidelines and runnable code to get started with MLflow and CrateDB, exercising time series anomaly detection and time series forecasting / prediction using NumPy, Merlion, and Matplotlib.
Anomaly Detection Forecasting / Prediction
Python MLflow
Use PyCaret to train time series forecasting models
This notebook explores the PyCaret framework and shows how to use it to train various time series forecasting models.
Forecasting / Prediction
Python PyCaret MLflow
Time Series Decomposition¶
Decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns.
There are two principal types of decomposition, one based on rates of change, the other based on predictability.
You can use this method to dissect a time series into multiple components, typically including trend, seasonal, and random (or irregular) components.
This process helps in understanding the underlying patterns of the time series data, such as identifying any long term direction (trend), recurring patterns at fixed intervals (seasonality), and randomness (irregular fluctuations) in the data.
Decomposition is crucial for analyzing how these components change over time, improving forecasts, and developing strategies for addressing each element effectively.
Examples
Analyze trend, seasonality, and fluctuations with PyCaret and CrateDB
Learn how to extract data from CrateDB for analysis in PyCaret, how to further preprocess it and how to use PyCaret to plot time series decomposition by breaking it down into its basic components: trend, seasonality, and residual (or irregular) fluctuations.
Time series decomposition
Python PyCaret
Exploratory data analysis (EDA)¶
Exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.
EDA involves visualizing, summarizing, and analyzing data, to uncover patterns, anomalies, or relationships within the dataset.
The objective of this step is to gain an understanding and intuition of the data, identify potential issues, and, in machine learning, guide feature engineering and model building.
Examples
Exploratory data analysis (EDA) with PyCaret and CrateDB
Learn how to access time series data from CrateDB using SQL, and how to apply exploratory data analysis (EDA) with PyCaret.
The notebook shows how to generate various plots and charts for EDA, helping you to understand data distributions, relationships between variables, and to identify patterns.
EDA on time series
Python PyCaret
Metadata Integration¶
CrateDB is particularly effective when you need to combine time series data with metadata, for instance, in scenarios where data like sensor readings or log entries, need to be augmented with additional context for more insightful analysis. See also Document Store.
CrateDB supports effective time series analysis with fast aggregations, a rich set of built-in functions, and JOIN operations.
Examples
Analyzing Device Readings with Metadata Integration
This tutorial illustrates how to augment time series data with metadata, in order to enable more comprehensive analysis. It uses a time series dataset that captures various device readings, such as battery, CPU, and memory information.
Rich Time Series Metadata
SQL
SQL Analysis¶
CrateDB offers enhanced features for analysing time series data.
Examples
Analyzing Weather Data
Run aggregations with gap filling / interpolation, using common table expressions (CTEs) and LAG / LEAD window functions.
Find maximum values using the MAX_BY aggregate function, returning the value from one column based on the maximum or minimum value of another column within a group.
Aggregations Time series
SQL
Visualization¶
Similar to EDA, just applying data and information visualization can yield significant insights into the characteristics of your data. By using best-of-breed data visualization tools, initial data exploration is mostly your first encounter with the data.
Examples
CrateDB for Time Series Modeling, Exploration, and Visualization
Access time series data from CrateDB via SQL, load it into pandas DataFrames, and visualize it using Plotly.
About advanced time series operations in SQL, like aggregations, window functions, interpolation of missing data, common table expressions, moving averages, relational JOINs, and the handling of JSON data.
Time series visualization
Python pandas Plotly Dash
Display millions of data points using hvPlot, Datashader, and CrateDB
HoloViews and Datashader frameworks enable channeling millions of data points from your backend systems to the browser’s glass.
This notebook plots the venerable NYC Taxi dataset after importing it into a CrateDB Cloud database cluster.
🚧 Please note this notebook is a work in progress. 🚧
Time series visualization
Python HoloViews hvPlot Datashader