Machine Learning

Integrate CrateDB with machine learning frameworks and tools, for MLOps and vector database operations.

Machine Learning Operations

Training a machine learning model, running it in production, and maintaining it, requires a significant amount of data processing and bookkeeping operations.

CrateDB, as a universal SQL database, supports this process through adapters to best-of-breed software components for MLOps procedures.

MLOps is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently, including experiment tracking, and in the spirit of continuous development and DevOps.

Vector Store

CrateDB’s FLOAT_VECTOR data type implements a vector store and the k-nearest neighbour (kNN) search algorithm to find vectors that are similar to a query vector.

These feature vectors may be computed from raw data using machine learning methods such as feature extraction algorithms, word embeddings, or deep learning networks.

Vector databases can be used for similarity search, multi-modal search, recommendation engines, large language models (LLMs), retrieval-augmented generation (RAG), and other applications.

Anomaly Detection and Forecasting

MLflow

Tutorials and Notebooks about using MLflow together with CrateDB.

Blog: Running Time Series Models in Production using CrateDB

Part 1: Introduction to Time Series Modeling using Machine Learning

The article will introduce you to the concept of time series modeling, discussing the main obstacles running it in production. It will introduce you to CrateDB, highlighting its key features and benefits, why it stands out in managing time series data, and why it is an especially good fit for supporting machine learning models in production.

Fundamentals
Time Series Modeling

Notebook: Create a Time Series Anomaly Detection Model

Guidelines and runnable code to get started with MLflow and CrateDB, exercising time series anomaly detection and time series forecasting / prediction using NumPy, Salesforce Merlion, and Matplotlib.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Anomaly Detection
Prediction / Forecasting

PyCaret

Tutorials and Notebooks about using PyCaret together with CrateDB.

Notebook: AutoML classification with PyCaret

Explore the PyCaret framework and show how to use it to train different classification models.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Anomaly Detection
Prediction / Forecasting

Notebook: Train time series forecasting models

How to train time series forecasting models using PyCaret and CrateDB.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Training
Classification
Forecasting

scikit-learn

Use scikit-learn with CrateDB.

Fundamentals
Regression Analysis

TensorFlow

Use TensorFlow with CrateDB.

Predictive Maintenance

Build a machine learning model that will predict whether a machine will fail within a specified time window in the future.

Fundamentals
Prediction

LLMs / RAG

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific sources of information, using a technique known as Retrieval Augmented Generation, or RAG. RAG is a technique for augmenting LLM knowledge with additional data.

Video Tutorials

How to Use Private Data in Generative AI

In this video recorded at FOSDEM 2024, we explain how to leverage private data in generative AI on behalf of an end-to-end Retrieval Augmented Generation (RAG) solution.

 

Fundamentals
Generative AI RAG

LangChain

LlamaIndex