LightAutoML documentation

LightAutoML is open-source Python library aimed at automated machine learning. It is designed to be lightweight and efficient for various tasks with tabular, text data. LightAutoML provides easy-to-use pipeline creation, that enables:

  • Automatic hyperparameter tuning, data processing.

  • Automatic typing, feature selection.

  • Automatic time utilization.

  • Automatic report creation.

  • Graphical profiling system.

  • Easy-to-use modular scheme to create your own pipelines.

lightautoml.automl

The main module, which includes the AutoML class, blenders and ready-made presets.

AutoML

Class for compile full pipeline of AutoML task.

Presets

Presets for end-to-end model training for special tasks.

base.AutoMLPreset

Basic class for automl preset.

whitebox_presets.WhiteBoxPreset

Preset for AutoWoE - logistic regression over binned features (scorecard).

Blenders

Blender

Basic class for blending.

BestModelSelector

Select best single model from level.

MeanBlender

Simple average level predictions.

WeightedBlender

Weighted Blender based on coord descent, optimize task metric directly.

lightautoml.addons

Extensions of core functionality.

Utilization

TimeUtilization

Class that helps to utilize given time to AutoMLPreset.

lightautoml.dataset

Provides an internal interface for working with data.

Dataset Interfaces

base.LAMLColumn

Basic class for pair - column, role.

base.LAMLDataset

Basic class to create dataset.

np_pd_dataset.NumpyDataset

Dataset that contains info in np.ndarray format.

np_pd_dataset.PandasDataset

Dataset that contains pd.DataFrame features and pd.Series targets.

np_pd_dataset.CSRSparseDataset

Dataset that contains sparse features and np.ndarray targets.

Roles

Role contains information about the column, which determines how it is processed.

ColumnRole

Abstract class for column role.

NumericRole

Numeric role.

CategoryRole

Category role.

TextRole

Text role.

DatetimeRole

Datetime role.

TargetRole

Target role.

GroupRole

Group role.

DropRole

Drop role.

WeightsRole

Weights role.

FoldsRole

Folds role.

PathRole

Path role.

Utils

Utilities for working with the structure of a dataset.

roles_parser

Parser of roles.

get_common_concat

Get concatenation function for datasets of different types.

numpy_and_pandas_concat

Concat of numpy and pandas dataset.

concatenate

Dataset concatenation function.

lightautoml.image

Provides an internal interface for working with image features.

Image Feature Extractors

Image feature extractors based on color histograms and CNN embeddings.

CreateImageFeatures

Class for parallel histogram computation.

EffNetImageEmbedder

Class to compute EfficientNet embeddings.

PyTorch Image Datasets

ImageDataset

Image Dataset Class.

DeepImageEmbedder

Transformer for image embeddings.

Utils

pil_loader

Load image from pathes.

lightautoml.ml_algo

Models used for machine learning pipelines.

Base Classes

MLAlgo

Abstract class for machine learning algorithm.

TabularMLAlgo

Machine learning algorithms that accepts numpy arrays as input.

Linear Models

LinearLBFGS

LBFGS L2 regression based on torch.

LinearL1CD

Coordinate descent based on sklearn implementation.

Boosted Trees

BoostLGBM

Gradient boosting on decision trees from LightGBM library.

BoostCB

Gradient boosting on decision trees from catboost library.

WhiteBox

WbMLAlgo

WhiteBox - scorecard model.

lightautoml.ml_algo.tuning

Bunch of classes for hyperparameters tuning.

Base Classes

ParamsTuner

Base abstract class for hyperparameters tuners.

DefaultTuner

Default realization of ParamsTuner - just take algo's defaults.

Tuning with Optuna

OptunaTuner

Wrapper for optuna tuner.

lightautoml.pipelines

Pipelines for solving different tasks.

Utils

map_pipeline_names

Pipelines create name in the way 'prefix__feature_name'.

get_columns_by_role

Search for columns with specific role and attributes when building pipeline.

lightautoml.pipelines.selection

Feature selection module for ML pipelines.

Base Classes

ImportanceEstimator

Abstract class, that estimates feature importances.

SelectionPipeline

Abstract class, performing feature selection.

Importance Based Selectors

ModelBasedImportanceEstimator

Base class for performing feature selection using model feature importances.

ImportanceCutoffSelector

Selector based on importance threshold.

NpPermutationImportanceEstimator

Permutation importance based estimator.

NpIterativeFeatureSelector

Select features sequentially using chunks to find the best combination of chunks.

Other Selectors

HighCorrRemoval

Selector to remove highly correlated features.

lightautoml.pipelines.features

Pipelines for features generation.

Base Classes

FeaturesPipeline

Abstract class.

EmptyFeaturePipeline

Dummy feature pipeline - .fit_transform and transform do nothing.

TabularDataFeatures

Helper class contains basic features transformations for tabular data.

Feature Pipelines for Boosting Models

LGBSimpleFeatures

Creates simple pipeline for tree based models.

LGBAdvancedPipeline

Create advanced pipeline for trees based models.

Feature Pipelines for Linear Models

LinearFeatures

Creates pipeline for linear models and nnets.

Feature Pipelines for WhiteBox

WBFeatures

Simple WhiteBox pipeline.

Image Feature Pipelines

ImageDataFeatures

Class contains basic features transformations for image data.

ImageSimpleFeatures

Class contains simple color histogram features for image data.

ImageAutoFeatures

Class contains efficient-net embeddings features for image data.

Text Feature Pipelines

NLPDataFeatures

Class contains basic features transformations for text data.

TextAutoFeatures

Class contains embedding features for text data.

NLPTFiDFFeatures

Class contains tfidf features for text data.

TextBertFeatures

Features pipeline for BERT.

lightautoml.pipelines.ml

Pipelines that merge together single model training steps.

Base Classes

MLPipeline

Single ML pipeline.

Pipeline for Nested Cross-Validation

NestedTabularMLAlgo

Wrapper for MLAlgo to make it trainable over nested folds.

NestedTabularMLPipeline

Wrapper for MLPipeline to make it trainable over nested folds.

Pipeline for WhiteBox

WBPipeline

Special pipeline to handle WhiteBox model.

lightautoml.reader

Utils for reading, training and analysing data.

Readers

Reader

Abstract class for analyzing input data and creating inner LAMLDataset from raw data.

PandasToPandasReader

Reader to convert DataFrame to AutoML's PandasDataset.

Tabular Batch Generators

Batch Handler Classes

Data Read Functions

lightautoml.report

Report generators and templates.

lightautoml.tasks

Task Class

Task

Specify task (binary classification, multiclass classification, regression), metrics, losses.

Common Metrics

Classes

F1Factory

Wrapper for f1_score function.

BestClassBinaryWrapper

Metric wrapper to get best class prediction instead of probs.

BestClassMulticlassWrapper

Metric wrapper to get best class prediction instead of probs for multiclass.

Functions

mean_quantile_error

Computes Mean Quantile Error.

mean_huber_error

Computes Mean Huber Error.

mean_fair_error

Computes Mean Fair Error.

mean_absolute_percentage_error

Computes Mean Absolute Percentage error.

roc_auc_ovr

ROC-AUC One-Versus-Rest.

rmsle

Root mean squared log error.

auc_mu

Compute multi-class metric AUC-Mu.

lightautoml.tasks.losses

Wrappers of loss and metric functions for different machine learning algorithms.

Base Classes

MetricFunc

Wrapper for metric.

Loss

Loss function with target transformation.

Wrappers for LightGBM

Classes

LGBFunc

Wrapper of metric function for LightGBM.

LGBLoss

Loss used for LightGBM.

Functions

softmax_ax1

Softmax columnwise.

lgb_f1_loss_multiclass

Custom loss for optimizing f1.

Wrappers for CatBoost

Classes

CBLoss

Loss used for CatBoost.

CBCustomMetric

Metric wrapper class for CatBoost.

CBRegressionMetric

Regression metric wrapper for CatBoost.

CBClassificationMetric

Classification metric wrapper for CatBoost.

CBMulticlassMetric

Multiclassification metric wrapper for CatBoost.

Functions

cb_str_loss_wrapper

CatBoost loss name wrapper, if it has keyword args.

Wrappers for Sklearn

Classes

SKLoss

Loss used for scikit-learn.

Wrappers for Torch

Classes

TorchLossWrapper

Customize PyTorch-based loss.

TORCHLoss

Loss used for PyTorch.

Functions

torch_rmsle

Computes Root Mean Squared Logarithmic Error.

torch_quantile

Computes Mean Quantile Error.

torch_fair

Computes Mean Fair Error.

torch_huber

Computes Mean Huber Error.

torch_f1

Computes F1 macro.

torch_mape

Computes Mean Absolute Percentage Error.

lightautoml.text

Provides an internal interface for working with text features.

Sentence Embedders

DLTransformer

Deep Learning based sentence embeddings.

BOREP

Class to compute Bag of Random Embedding Projections sentence embeddings from words embeddings.

RandomLSTM

Class to compute Random LSTM sentence embeddings from words embeddings.

BertEmbedder

Class to compute HuggingFace transformers words or sentence embeddings.

WeightedAverageTransformer

Weighted average of word embeddings.

Torch Datasets for Text

BertDataset

Dataset class with transformers tokenization.

EmbedDataset

Dataset class for extracting word embeddings.

Tokenizers

BaseTokenizer

Base class for tokenizer method.

SimpleRuTokenizer

Russian tokenizer.

SimpleEnTokenizer

English tokenizer.

Pooling Strategies

SequenceAbstractPooler

Abstract pooling class.

SequenceClsPooler

CLS token pooling.

SequenceMaxPooler

Max value pooling.

SequenceSumPooler

Sum value pooling.

SequenceAvgPooler

Mean value pooling.

SequenceIndentityPooler

Identity pooling.

Utils

seed_everything

Set random seed and cudnn params.

parse_devices

Parse devices and convert first to the torch device.

custom_collate

Puts each data field into a tensor with outer dimension batch size.

single_text_hash

Get text hash.

get_textarr_hash

Get hash of array with texts.

lightautoml.transformers

Basic feature generation steps and helper utils.

Base Classes

LAMLTransformer

Base class for transformer method (like sklearn, but works with datasets).

SequentialTransformer

Transformer that contains the list of transformers and apply one by one sequentially.

UnionTransformer

Transformer that apply the sequence on transformers in parallel on dataset and concatenate the result.

ColumnsSelector

Select columns to pass to another transformers (or feature selection).

ColumnwiseUnion

Apply 1 columns transformer to all columns.

BestOfTransformers

Apply multiple transformers and select best.

ConvertDataset

Convert dataset to given type.

ChangeRoles

Change data roles (include dtypes etc).

Numeric

NaNFlags

Create NaN flags.

FillnaMedian

Fillna with median.

FillInf

Fill inf with nan to handle as nan value.

LogOdds

Convert probs to logodds.

StandardScaler

Classic StandardScaler.

QuantileBinning

Discretization of numeric features by quantiles.

Categorical

LabelEncoder

Simple LabelEncoder in order of frequency.

OHEEncoder

Simple OneHotEncoder over label encoded categories.

FreqEncoder

Labels are encoded with frequency in train data.

OrdinalEncoder

Encoding ordinal categories into numbers.

TargetEncoder

Out-of-fold target encoding.

MultiClassTargetEncoder

Out-of-fold target encoding for multiclass task.

CatIntersectstions

Build label encoded intertsections of categorical variables.

Datetime

TimeToNum

Basic conversion strategy, used in selection one-to-one transformers.

BaseDiff

Basic conversion strategy, used in selection one-to-one transformers.

DateSeasons

Basic conversion strategy, used in selection one-to-one transformers.

Decompositions

PCATransformer

PCA.

SVDTransformer

TruncatedSVD.

Text

TunableTransformer

Base class for ML transformers.

TfidfTextTransformer

Simple Tfidf vectorizer.

TokenizerTransformer

Simple tokenizer transformer.

OneToOneTransformer

Out-of-fold sgd model prediction to reduce dimension of encoded text data.

ConcatTextTransformer

Concat text features transformer.

AutoNLPWrap

Calculate text embeddings.

Image

ImageFeaturesTransformer

Simple image histogram.

AutoCVWrap

Calculate image embeddings.

lightautoml.utils

Common util tools.

Timer

Timer

Timer to limit the duration tasks.

PipelineTimer

Timer is used to control time over full automl run.

TaskTimer

Timer is used to control time over single ML task run.

lightautoml.validation

The module provide classes and functions for model validation.

Iterators

TrainValidIterator

Abstract class to train/validation iteration.

DummyIterator

Simple Iterator which use train data as validation.

HoldoutIterator

Iterator for classic holdout - just predefined train and valid samples.

CustomIterator

Iterator that uses function to create folds indexes.

FoldsIterator

Classic cv iterator.

TimeSeriesIterator

Time Series Iterator.

Iterators Getters and Utils

create_validation_iterator

Creates train-validation iterator.

get_numpy_iterator

Get iterator for np/sparse dataset.

Indices and Tables