PyCaret¶
About
PyCaret is an open-source, low-code machine learning library for Python that automates machine learning workflows.
It is a high-level interface and AutoML wrapper on top of your loved machine learning libraries like scikit-learn, XGBoost, Ray, LightGBM, and many more. PyCaret provides a universal interface to utilize these libraries without needing to know the details of the underlying model architectures and parameters.
Concept
The general concept of PyCaret—and, in fact, of AutoML in general—is straightforward: take raw data, split it into training and test sets, train multiple models on the training set, evaluate on the test set, and select the best‑performing model.
Hyperparameter tuning
This process gets repeated for tuning the hyperparameters of the best models. Again, this process is highly empirical. The parameters are changed, the model is retrained and evaluated again. This process is repeated until the best performing parameters are found.
Common approaches include Grid Search, Random Search, and Bayesian Optimization. For a quick introduction to these methods, see Introduction to hyperparameter tuning.
Benefits
In the past, all these trial-and-error experiments had to be done manually, which is a tedious and time-consuming task. PyCaret automates this process and provides a simple interface to execute all these experiments in a straightforward way. The notebooks referenced below demonstrate how this works.
Learn
About using PyCaret together with CrateDB.