Machine Learning¶
Integrate CrateDB with machine learning frameworks and tools, for MLOps and vector database operations.
Machine Learning Operations
Training a machine learning model, running it in production, and maintaining it, requires a significant amount of data processing and bookkeeping operations.
CrateDB, as a universal SQL database, supports this process through adapters to best-of-breed software components for MLOps procedures.
MLOps is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently, including experiment tracking, and in the spirit of continuous development and DevOps.
Vector Store
CrateDB’s FLOAT_VECTOR data type implements a vector store and the k-nearest neighbour (kNN) search algorithm to find vectors that are similar to a query vector.
These feature vectors may be computed from raw data using machine learning methods such as feature extraction algorithms, word embeddings, or deep learning networks.
Vector databases can be used for similarity search, multi-modal search, recommendation engines, large language models (LLMs), retrieval-augmented generation (RAG), and other applications.
Anomaly Detection and Forecasting¶
MLflow¶
Tutorials and Notebooks about using MLflow together with CrateDB.
Blog: Running Time Series Models in Production using CrateDB
Part 1: Introduction to Time Series Modeling using Machine Learning
The article will introduce you to the concept of time series modeling, discussing the main obstacles running it in production. It will introduce you to CrateDB, highlighting its key features and benefits, why it stands out in managing time series data, and why it is an especially good fit for supporting machine learning models in production.
Fundamentals
Time Series Modeling
Notebook: Create a Time Series Anomaly Detection Model
Guidelines and runnable code to get started with MLflow and CrateDB, exercising time series anomaly detection and time series forecasting / prediction using NumPy, Salesforce Merlion, and Matplotlib.
Fundamentals
Time Series
Anomaly Detection
Prediction / Forecasting
PyCaret¶
Tutorials and Notebooks about using PyCaret together with CrateDB.
scikit-learn¶
Use scikit-learn with CrateDB.
Regression analysis with pandas and scikit-learn
Use pandas and scikit-learn to run a regression analysis within a Jupyter Notebook.
Fundamentals
Regression Analysis
TensorFlow¶
Use TensorFlow with CrateDB.
Predictive Maintenance
Build a machine learning model that will predict whether a machine will fail within a specified time window in the future.
Fundamentals
Prediction
LLMs / RAG¶
One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific sources of information, using a technique known as Retrieval Augmented Generation, or RAG. RAG is a technique for augmenting LLM knowledge with additional data.
Video Tutorials
How to Use Private Data in Generative AI
In this video recorded at FOSDEM 2024, we explain how to leverage private data in generative AI on behalf of an end-to-end Retrieval Augmented Generation (RAG) solution.
Fundamentals
Generative AI
RAG