AutoML for Anomaly Detection

Anomaly detection in time series data is a crucial technique aimed at identifying patterns or outliers that significantly deviate from the norm within a given dataset over time. This methodology employs statistical and machine learning algorithms to analyze large datasets over specific time intervals, examining patterns, trends, and seasonality to pinpoint anomalies. These detected deviations could signify an error or other significant events requiring attention.

This technique finds wide-ranging applications across various fields and industries:

In cybersecurity, it helps identify unusual network activity patterns, potentially indicating a breach;
In the finance sector, anomaly detection is pivotal for identifying fraudulent activities in credit card transactions;
In the realm of IoT, it is employed for detecting malfunctioning sensors and machines;
In healthcare, anomaly detection is useful for monitoring unusual patient vital signs, and predictive maintenance, aiding in the early identification of abnormal machine behavior to prevent system failures.

Practical Application of Data Decomposition with PyCaret and CrateDB

In the following Jupyter Notebook, we illustrate anomaly detection with CrateDB and PyCaret, using temperature readings from the Numenta Anomaly Benchmark dataset. The objective of this exercise is to detect anomalies that could indicate equipment malfunctions, using a practical example that simulates real-world machine measurements.

Step 1.

Repeat Step 1 from the previous chapters to set up the CrateDB connection.

Step 2. Loading Time Series Data with SELECT DATE_BIN

We import the data into CrateDB and aggregate it for anomaly detection, focusing on evenly spaced time intervals. The DATE_BIN function in CrateDB groups data into 5-minute intervals, and the average value within each interval is calculated. The timestamp is converted into Python datetime objects.

query = "SELECT DATE_BIN('5 min'::INTERVAL, timestamp, 0) AS timestamp, AVG(value) AS avg_value FROM machine_data GROUP BY timestamp ORDER BY timestamp ASC;"  

with engine.connect() as conn:
    df = pd.read_sql(sql=sa.text(query), con=conn)

df['timestamp'] = df['timestamp'].transform(lambda x: datetime.fromtimestamp(x/1000))

df = df.set_index('timestamp')

df.head()

# Known anomalies in the data
anomalies = [
    ["2013-12-15 17:50:00.000000", "2013-12-17 17:00:00.000000"],
    ["2014-01-27 14:20:00.000000", "2014-01-29 13:30:00.000000"],
    ["2014-02-07 14:55:00.000000", "2014-02-09 14:05:00.000000"]
]

plt.figure(figsize=(12,7))
line = plt.plot(df.index, df['avg_value'], linestyle='solid', color='black', label='Temperature')

# Highlight anomalies
ctr = 0
for timeframe in anomalies:
   ctr += 1
   plt.axvspan(pd.to_datetime(timeframe[0]),
   pd.to_datetime(timeframe[1]), color='blue', alpha=0.3,
   label=f'Anomaly {ctr}')

# Formatting x-axis for better readability
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y/%m/%d'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=7))
plt.gcf().autofmt_xdate()  

# Rotate & align the x labels for a better view
plt.title('Temperature Over Time', fontsize=20, fontweight='bold', pad=30)
plt.ylabel('Temperature')

# Add legend to the right
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

plt.tight_layout()
plt.show()

# plot value on y-axis and date on x-axis
pio.renderers.default = 'png'
fig = px.line(mcd_results, x=mcd_results.index, y="avg_value", title='MACHINE DATA - UNSUPERVISED ANOMALY DETECTION', template = 'plotly_dark')

# create list of outlier_dates
outlier_dates = mcd_results[mcd_results['Anomaly'] == 1].index

# obtain y value of anomalies to plot
y_values = [mcd_results.loc[i]['avg_value'] for i in outlier_dates]

fig.add_trace(go.Scatter(x=outlier_dates, y=y_values, mode = 'markers',
                name = 'Anomaly',
                marker=dict(color='red',size=10)))

fig.show()

Time Series Data Decomposition

Practical Application of Data Decomposition with PyCaret and CrateDB

Want to read more?

Company

Ecosystem

Contact