PyCaret

PyCaret logo

About

PyCaret is an open-source, low-code machine learning library for Python that automates machine learning workflows.

It is a high-level interface and AutoML wrapper on top of your loved machine learning libraries like scikit-learn, XGBoost, Ray, LightGBM, and many more. PyCaret provides a universal interface to utilize these libraries without needing to know the details of the underlying model architectures and parameters.

Concept

The general concept of PyCaret—and, in fact, of AutoML in general—is straightforward: take raw data, split it into training and test sets, train multiple models on the training set, evaluate on the test set, and select the best‑performing model.

Hyperparameter tuning

This process gets repeated for tuning the hyperparameters of the best models. Again, this process is highly empirical. The parameters are changed, the model is retrained and evaluated again. This process is repeated until the best performing parameters are found.

Common approaches include Grid Search, Random Search, and Bayesian Optimization. For a quick introduction to these methods, see Introduction to hyperparameter tuning.

Benefits

In the past, all these trial-and-error experiments had to be done manually, which is a tedious and time-consuming task. PyCaret automates this process and provides a simple interface to execute all these experiments in a straightforward way. The notebooks referenced below demonstrate how this works.

Learn

About using PyCaret together with CrateDB.

Notebook: AutoML classification with PyCaret

Explore the PyCaret framework and show how to use it to train different classification models.

README Notebook on GitHub Notebook on Colab

Fundamentals
Classification

Notebook: Train time series forecasting models

How to train time series forecasting models using PyCaret and CrateDB.

README Notebook on GitHub Notebook on Colab

Fundamentals
Time Series
Prediction / Forecasting