LightAutoML documentation
LightAutoML is open-source Python library aimed at automated machine learning. It is designed to be lightweight and efficient for various tasks with tabular, text data. LightAutoML provides easy-to-use pipeline creation, that enables:
Automatic hyperparameter tuning, data processing.
Automatic typing, feature selection.
Automatic time utilization.
Automatic report creation.
Graphical profiling system.
Easy-to-use modular scheme to create your own pipelines.
lightautoml.automl
The main module, which includes the AutoML class, blenders and ready-made presets.
Class for compile full pipeline of AutoML task. |
Presets
Presets for end-to-end model training for special tasks.
Basic class for automl preset. |
|
Preset for AutoWoE - logistic regression over binned features (scorecard). |
Blenders
Basic class for blending. |
|
Select best single model from level. |
|
Simple average level predictions. |
|
Weighted Blender based on coord descent, optimize task metric directly. |
lightautoml.addons
Extensions of core functionality.
Utilization
Class that helps to utilize given time to |
lightautoml.dataset
Provides an internal interface for working with data.
Dataset Interfaces
Basic class for pair - column, role. |
|
Basic class to create dataset. |
|
Dataset that contains info in np.ndarray format. |
|
Dataset that contains pd.DataFrame features and pd.Series targets. |
|
Dataset that contains sparse features and np.ndarray targets. |
Roles
Role contains information about the column, which determines how it is processed.
Abstract class for column role. |
|
Numeric role. |
|
Category role. |
|
Text role. |
|
Datetime role. |
|
Target role. |
|
Group role. |
|
Drop role. |
|
Weights role. |
|
Folds role. |
|
Path role. |
Utils
Utilities for working with the structure of a dataset.
Parser of roles. |
|
Get concatenation function for datasets of different types. |
|
Concat of numpy and pandas dataset. |
|
Dataset concatenation function. |
lightautoml.image
Provides an internal interface for working with image features.
Image Feature Extractors
Image feature extractors based on color histograms and CNN embeddings.
Class for parallel histogram computation. |
|
Class to compute EfficientNet embeddings. |
PyTorch Image Datasets
Image Dataset Class. |
|
Transformer for image embeddings. |
Utils
Load image from pathes. |
lightautoml.ml_algo
Models used for machine learning pipelines.
Base Classes
Abstract class for machine learning algorithm. |
|
Machine learning algorithms that accepts numpy arrays as input. |
Linear Models
LBFGS L2 regression based on torch. |
|
Coordinate descent based on sklearn implementation. |
Boosted Trees
Gradient boosting on decision trees from LightGBM library. |
|
Gradient boosting on decision trees from catboost library. |
WhiteBox
WhiteBox - scorecard model. |
lightautoml.ml_algo.tuning
Bunch of classes for hyperparameters tuning.
Base Classes
Base abstract class for hyperparameters tuners. |
|
Default realization of ParamsTuner - just take algo's defaults. |
Tuning with Optuna
Wrapper for optuna tuner. |
lightautoml.pipelines
Pipelines for solving different tasks.
Utils
Pipelines create name in the way 'prefix__feature_name'. |
|
Search for columns with specific role and attributes when building pipeline. |
lightautoml.pipelines.selection
Feature selection module for ML pipelines.
Base Classes
Abstract class, that estimates feature importances. |
|
Abstract class, performing feature selection. |
Importance Based Selectors
Base class for performing feature selection using model feature importances. |
|
Selector based on importance threshold. |
|
Permutation importance based estimator. |
|
Select features sequentially using chunks to find the best combination of chunks. |
Other Selectors
Selector to remove highly correlated features. |
lightautoml.pipelines.features
Pipelines for features generation.
Base Classes
Abstract class. |
|
Dummy feature pipeline - |
|
Helper class contains basic features transformations for tabular data. |
Feature Pipelines for Boosting Models
Creates simple pipeline for tree based models. |
|
Create advanced pipeline for trees based models. |
Feature Pipelines for Linear Models
Creates pipeline for linear models and nnets. |
Feature Pipelines for WhiteBox
Simple WhiteBox pipeline. |
Image Feature Pipelines
Class contains basic features transformations for image data. |
|
Class contains simple color histogram features for image data. |
|
Class contains efficient-net embeddings features for image data. |
Text Feature Pipelines
Class contains basic features transformations for text data. |
|
Class contains embedding features for text data. |
|
Class contains tfidf features for text data. |
|
Features pipeline for BERT. |
lightautoml.pipelines.ml
Pipelines that merge together single model training steps.
Base Classes
Single ML pipeline. |
Pipeline for Nested Cross-Validation
Wrapper for MLAlgo to make it trainable over nested folds. |
|
Wrapper for MLPipeline to make it trainable over nested folds. |
Pipeline for WhiteBox
Special pipeline to handle WhiteBox model. |
lightautoml.reader
Utils for reading, training and analysing data.
Readers
Abstract class for analyzing input data and creating inner |
|
Reader to convert |
Tabular Batch Generators
Batch Handler Classes
Data Read Functions
lightautoml.report
Report generators and templates.
lightautoml.tasks
Task Class
Specify task (binary classification, multiclass classification, regression), metrics, losses. |
Common Metrics
Classes
Wrapper for |
|
Metric wrapper to get best class prediction instead of probs. |
|
Metric wrapper to get best class prediction instead of probs for multiclass. |
Functions
Computes Mean Quantile Error. |
|
Computes Mean Huber Error. |
|
Computes Mean Fair Error. |
|
Computes Mean Absolute Percentage error. |
|
ROC-AUC One-Versus-Rest. |
|
Root mean squared log error. |
|
Compute multi-class metric AUC-Mu. |
lightautoml.tasks.losses
Wrappers of loss and metric functions for different machine learning algorithms.
Base Classes
Wrapper for metric. |
|
Loss function with target transformation. |
Wrappers for LightGBM
Classes
Wrapper of metric function for LightGBM. |
|
Loss used for LightGBM. |
Functions
Softmax columnwise. |
|
Custom loss for optimizing f1. |
Wrappers for CatBoost
Classes
Loss used for CatBoost. |
|
Metric wrapper class for CatBoost. |
|
Regression metric wrapper for CatBoost. |
|
Classification metric wrapper for CatBoost. |
|
Multiclassification metric wrapper for CatBoost. |
Functions
CatBoost loss name wrapper, if it has keyword args. |
Wrappers for Sklearn
Classes
Loss used for scikit-learn. |
Wrappers for Torch
Classes
Customize PyTorch-based loss. |
|
Loss used for PyTorch. |
Functions
Computes Root Mean Squared Logarithmic Error. |
|
Computes Mean Quantile Error. |
|
Computes Mean Fair Error. |
|
Computes Mean Huber Error. |
|
Computes F1 macro. |
|
Computes Mean Absolute Percentage Error. |
lightautoml.text
Provides an internal interface for working with text features.
Sentence Embedders
Deep Learning based sentence embeddings. |
|
Class to compute Bag of Random Embedding Projections sentence embeddings from words embeddings. |
|
Class to compute Random LSTM sentence embeddings from words embeddings. |
|
Class to compute HuggingFace transformers words or sentence embeddings. |
|
Weighted average of word embeddings. |
Torch Datasets for Text
Dataset class with transformers tokenization. |
|
Dataset class for extracting word embeddings. |
Tokenizers
Base class for tokenizer method. |
|
Russian tokenizer. |
|
English tokenizer. |
Pooling Strategies
Abstract pooling class. |
|
CLS token pooling. |
|
Max value pooling. |
|
Sum value pooling. |
|
Mean value pooling. |
|
Identity pooling. |
Utils
Set random seed and cudnn params. |
|
Parse devices and convert first to the torch device. |
|
Puts each data field into a tensor with outer dimension batch size. |
|
Get text hash. |
|
Get hash of array with texts. |
lightautoml.transformers
Basic feature generation steps and helper utils.
Base Classes
Base class for transformer method (like sklearn, but works with datasets). |
|
Transformer that contains the list of transformers and apply one by one sequentially. |
|
Transformer that apply the sequence on transformers in parallel on dataset and concatenate the result. |
|
Select columns to pass to another transformers (or feature selection). |
|
Apply 1 columns transformer to all columns. |
|
Apply multiple transformers and select best. |
|
Convert dataset to given type. |
|
Change data roles (include dtypes etc). |
Numeric
Create NaN flags. |
|
Fillna with median. |
|
Fill inf with nan to handle as nan value. |
|
Convert probs to logodds. |
|
Classic StandardScaler. |
|
Discretization of numeric features by quantiles. |
Categorical
Simple LabelEncoder in order of frequency. |
|
Simple OneHotEncoder over label encoded categories. |
|
Labels are encoded with frequency in train data. |
|
Encoding ordinal categories into numbers. |
|
Out-of-fold target encoding. |
|
Out-of-fold target encoding for multiclass task. |
|
Build label encoded intertsections of categorical variables. |
Datetime
Basic conversion strategy, used in selection one-to-one transformers. |
|
Basic conversion strategy, used in selection one-to-one transformers. |
|
Basic conversion strategy, used in selection one-to-one transformers. |
Decompositions
PCA. |
|
TruncatedSVD. |
Text
Base class for ML transformers. |
|
Simple Tfidf vectorizer. |
|
Simple tokenizer transformer. |
|
Out-of-fold sgd model prediction to reduce dimension of encoded text data. |
|
Concat text features transformer. |
|
Calculate text embeddings. |
Image
Simple image histogram. |
|
Calculate image embeddings. |
lightautoml.utils
Common util tools.
Timer
Timer to limit the duration tasks. |
|
Timer is used to control time over full automl run. |
|
Timer is used to control time over single ML task run. |
lightautoml.validation
The module provide classes and functions for model validation.
Iterators
Abstract class to train/validation iteration. |
|
Simple Iterator which use train data as validation. |
|
Iterator for classic holdout - just predefined train and valid samples. |
|
Iterator that uses function to create folds indexes. |
|
Classic cv iterator. |
|
Time Series Iterator. |
Iterators Getters and Utils
Creates train-validation iterator. |
|
Get iterator for np/sparse dataset. |