Machine Learning with CrateDB

This documentation section lists machine learning applications and frameworks which can be used together with CrateDB.



LangChain is a framework for developing applications powered by language models, written in Python, and with a strong focus on composability. As a language model integration framework, LangChain’s use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.

LangChain supports retrieval-augmented generation (RAG), which is a technique for augmenting LLM knowledge with additional, often private or real-time, data, and mixing in “prompt engineering” as the process of structuring text that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

The LangChain adapter for CrateDB provides support to use CrateDB as a vector store database, to load documents using LangChain’s DocumentLoader, and also supports LangChain’s conversational memory subsystem.

See also


MLflow is an open source platform to manage the whole ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

The MLflow adapter for CrateDB, available through the mlflow-cratedb package, provides support to use CrateDB as a storage database for the MLflow Tracking subsystem, which is about recording and querying experiments, across code, data, config, and results.

See also


PyCaret is an open-source, low-code machine learning library for Python that automates machine learning workflows.

It is a high-level interface and AutoML wrapper on top of your loved machine learning libraries like scikit-learn, xgboost, ray, lightgbm, and many more. PyCaret provides a universal interface to utilize these libraries without needing to know the details of the underlying model architectures and parameters.

See also

  • AutoML with PyCaret and CrateDB

  • The automl_classification_with_pycaret.ipynb example notebook explores the PyCaret framework and shows how to use it to train different classification models.

    Open on GitHub Open in Collab

  • The automl_timeseries_forecasting_with_pycaret.ipynb example notebook explores the PyCaret framework and shows how to use it to train various timeseries forecasting models.

    Open on GitHub Open in Collab