Time series decomposition is crucial in understanding the underlying patterns and behaviours of time series data. It also significantly improves the accuracy and effectiveness of our forecasting models by isolating and analyzing the main components: trend, seasonality and noise.
In this video, we will leverage the power of PyCaret and CrateDB for time series decomposition, unlocking valuable insights for forecasting and analysis.
Time series decomposition is a technique that involves breaking down a time series into its three core components. This method is essential for understanding the different factors that influence the behavior of the dataset over time. First, we have the 'Trend' component, which shows us the long-term direction or movement in the data, representing the gradual increase or decrease over time. Then we have 'Seasonality', which refers to patterns that repeat at regular intervals within the data. These could be influenced by various seasonal factors such as holidays, weather patterns, or business cycles, and are critical for understanding periodic fluctuations. Finally, 'Residuals' represent the random variations, also known as noise, that remain after the trend and seasonal components have been accounted for. These are the unpredictable parts of the series that often hold unique insights into unforeseen events or anomalies within the data.
By separating a time series into these components, we gain a clearer understanding of the underlying structure and are better positioned to make accurate predictions and strategic decisions based on the data.
This notebook demonstrates how to use SQLAlchemy to load time series data from CrateDB, preprocess it, and plot time series decomposition with PyCaret, an efficient low-code machine learning library in Python that has been introduced in a previous session.
All necessary dependencies to run this notebook are located in the requirements.txt
file and they can be installed by running the pip install
command. If you are running this notebook in an environment like Google Colab, use the absolute path as illustrated here.
As a first step, we are looking into connecting to a CrateDB instance using SQLAlchemy. Firstly, we establish a connection string and we can consider two scenarios: one for connecting to a local CrateDB instance which is often used for development and testing purposes, and another for connecting to a CrateDB Cloud instance, which is typically utilized for production databases hosted in the cloud.
Once the connection string is defined, we create an engine object in SQLAlchemy. It's a common interface to the database that we are querying from. The following code demonstrates fetching data from the CrateDB instance and loading it into a pandas DataFrame. We execute a simple SQL query to select all records from weather_data
. After successfully fetching the data, we preview the first five rows of our DataFrame.
As illustrated in the notebook, the next few steps preprocess data for an analysis: by setting the index column, interpolating missing data, and computing daily averages. We refer to our previous video on exploratory data analysis for more details on these steps. Similarly, we also set up the forecasting experiment for time series data by passing the temperature as the target variable and three seasonal periods. As the model suggests, we can observe that there are possible seasonality for 5 and 20 days. The primary seasonality is detected for 5 days.
Now let’s visualize the time series decomposition with PyCaret. PyCaret automates much of the analytical process and it already detected the recurring pattern in our data. To further explore this, we can employ the plot_model function with the plot='decomp'
parameter. This powerful visualization command instructs PyCaret to generate decomposition graphs, which will display the trend, seasonality, and residuals of our time series as illustrated in the first graph.
We can also pass specific parameters to change the seasonality period. For instance, we can use the same method with a seasonal period of 20 days. The second graph illustrates the result. Besides classical additive decomposition, we can use more advanced decomposition methods. The STL decomposition allows us to estimate both the trend and seasonal components with greater flexibility. One of the key advantages of STL is its ability to let the seasonal component evolve, which is particularly useful for datasets where seasonality may vary from one cycle to another. The third graph in the notebook visualizes the STL decomposition.
Based on the results we can make a couple of observations: there is an upward trend of temperature in spring and summer and a downward trend in autumn. Seasonality is also present in the series. The residuals in classical decompositions are also interesting, showing periods of high variability.
Moving forward, let’s take a closer look at how PyCaret enables us to understand the statistical properties of our time series data. PyCaret's check_stats
function is a comprehensive tool that provides a snapshot of statistics and performs a suite of statistical tests on our data or the residuals of our model. For instance, the summary test gives us descriptive statistics, offering insights into the central tendency, dispersion, and shape of the dataset's distribution. As you can see, other tests give us deeper insights into our time series.
With this overview of time series decomposition with PyCaret and CrateDB, you should have a solid understanding of the data’s statistical properties which is essential for reliable and insightful anomaly detection and forecasting.