Advanced Time Series Analysis

Learn how to conduct advanced data analysis on large time series datasets with CrateDB.

Anomaly detection Forecasting / Prediction Time series decomposition Exploratory data analysis Metadata integration

Anomaly Detection and Forecasting

To gain insights from your data in a one-shot or recurring way, based on machine learning techniques, you may want to look into applying anomaly detection and/or forecasting methods.

Examples

Use MLflow for time series anomaly detection and time series forecasting

Guidelines and runnable code to get started with MLflow and CrateDB, exercising time series anomaly detection and time series forecasting / prediction using NumPy, Merlion, and Matplotlib.

README Notebook on GitHub Notebook on Colab

Anomaly Detection Forecasting / Prediction

Python MLflow

Use PyCaret to train time series forecasting models

This notebook explores the PyCaret framework and shows how to use it to train various time series forecasting models.

README Notebook on GitHub Notebook on Colab

Forecasting / Prediction

Python PyCaret MLflow

Time Series Decomposition

Decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns.

There are two principal types of decomposition, one based on rates of change, the other based on predictability.

You can use this method to dissect a time series into multiple components, typically including trend, seasonal, and random (or irregular) components.

This process helps in understanding the underlying patterns of the time series data, such as identifying any long term direction (trend), recurring patterns at fixed intervals (seasonality), and randomness (irregular fluctuations) in the data.

Decomposition is crucial for analyzing how these components change over time, improving forecasts, and developing strategies for addressing each element effectively.

Examples

Analyze trend, seasonality, and fluctuations with PyCaret and CrateDB

Learn how to extract data from CrateDB for analysis in PyCaret, how to further preprocess it and how to use PyCaret to plot time series decomposition by breaking it down into its basic components: trend, seasonality, and residual (or irregular) fluctuations.

Notebook on GitHub Notebook on Colab

Time series decomposition

Python PyCaret

Exploratory data analysis (EDA)

Exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.

EDA involves visualizing, summarizing, and analyzing data, to uncover patterns, anomalies, or relationships within the dataset.

The objective of this step is to gain an understanding and intuition of the data, identify potential issues, and, in machine learning, guide feature engineering and model building.

Examples

Exploratory data analysis (EDA) with PyCaret and CrateDB

Learn how to access time series data from CrateDB using SQL, and how to apply exploratory data analysis (EDA) with PyCaret.

The notebook shows how to generate various plots and charts for EDA, helping you to understand data distributions, relationships between variables, and to identify patterns.

Notebook on GitHub Notebook on Colab

EDA on time series

Python PyCaret

Metadata Integration

CrateDB is particularly effective when you need to combine time series data with metadata, for instance, in scenarios where data like sensor readings or log entries, need to be augmented with additional context for more insightful analysis. See also Document Store.

CrateDB supports effective time series analysis with fast aggregations, a rich set of built-in functions, and JOIN operations.

Examples

Analyzing Device Readings with Metadata Integration

This tutorial illustrates how to augment time series data with metadata, in order to enable more comprehensive analysis. It uses a time series dataset that captures various device readings, such as battery, CPU, and memory information.

Navigate to Tutorial

Rich Time Series Metadata

SQL

SQL Analysis

CrateDB offers enhanced features for analysing time series data.

Examples

Analyzing Weather Data

Run aggregations with gap filling / interpolation, using common table expressions (CTEs) and LAG / LEAD window functions.

Find maximum values using the MAX_BY aggregate function, returning the value from one column based on the maximum or minimum value of another column within a group.

Navigate to Tutorial

Aggregations Time series

SQL

Visualization

Similar to EDA, just applying data and information visualization can yield significant insights into the characteristics of your data. By using best-of-breed data visualization tools, initial data exploration is mostly your first encounter with the data.

Examples

CrateDB for Time Series Modeling, Exploration, and Visualization

Access time series data from CrateDB via SQL, load it into pandas DataFrames, and visualize it using Plotly.

About advanced time series operations in SQL, like aggregations, window functions, interpolation of missing data, common table expressions, moving averages, relational JOINs, and the handling of JSON data.

Notebook on GitHub Notebook on Colab

Time series visualization

Python pandas Plotly Dash

Display millions of data points using hvPlot, Datashader, and CrateDB

HoloViews and Datashader frameworks enable channeling millions of data points from your backend systems to the browser’s glass.

This notebook plots the venerable NYC Taxi dataset after importing it into a CrateDB Cloud database cluster.

🚧 Please note this notebook is a work in progress. 🚧

Notebook on GitHub Notebook on Colab

Time series visualization

Python HoloViews hvPlot Datashader