AutoML
- class lightautoml.automl.base.AutoML(reader, levels, timer=None, blender=None, skip_conn=False, return_all_predictions=False, debug=False)[source]
Bases:
object
Class for compile full pipeline of AutoML task.
AutoML steps:
Read, analyze data and get inner
LAMLDataset
from input dataset: performed by reader.Create validation scheme.
Compute passed ml pipelines from levels. Each element of levels is list of
MLPipelines
prediction from current level are passed to next level pipelines as features.Time monitoring - check if we have enough time to calc new pipeline.
Blend last level models and prune useless pipelines to speedup inference: performed by blender.
Returns prediction on validation data. If crossvalidation scheme is used, out-of-fold prediction will returned. If validation data is passed it will return prediction on validation dataset. In case of cv scheme when some point of train data never was used as validation (ex. timeout exceeded or custom cv iterator like
TimeSeriesIterator
was used) NaN for this point will be returned.
Example
Common usecase - create custom pipelines or presets.
>>> reader = SomeReader() >>> pipe = MLPipeline([SomeAlgo()]) >>> levels = [[pipe]] >>> automl = AutoML(reader, levels, ) >>> automl.fit_predict(data, roles={'target': 'TARGET'})
- Parameters:
reader (
Reader
) – Instance of Reader class object that createsLAMLDataset
from input data.levels (
Sequence
[Sequence
[MLPipeline
]]) – List of list ofMLPipelines
.timer (
Optional
[PipelineTimer
]) – Timer instance ofPipelineTimer
. Default - unlimited timer.blender (
Optional
[Blender
]) – Instance of Blender. Default -BestModelSelector
.skip_conn (
bool
) – True if we should pass first level input features to next levels.
Note
There are several verbosity levels:
0: No messages.
1: Warnings.
2: Info.
3: Debug.
- fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0)[source]
Fit on input data and make prediction on validation part.
- Parameters:
train_data (
Any
) – Dataset to train.roles (
dict
) – Roles dict.train_features (
Optional
[Sequence
[str
]]) – Optional features names, if cannot be inferred from train_data.cv_iter (
Optional
[Iterable
]) – Custom cv iterator. For example,TimeSeriesIterator
.valid_features (
Optional
[Sequence
[str
]]) – Optional validation dataset features if can’t be inferred from valid_data.verbose (
int
) – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed.
- Return type:
- Returns:
Predicted values.
- predict(data, features_names=None, return_all_predictions=None)[source]
Predict with automl on new dataset.
- Parameters:
- Return type:
- Returns:
Dataset with predictions.