HighCorrRemoval

class lightautoml.pipelines.selection.linear_selector.HighCorrRemoval(corr_co=0.98, subsample=100000, random_state=42, **kwargs)[source]

Bases: lightautoml.pipelines.selection.base.SelectionPipeline

Selector to remove highly correlated features.

Del totally correlated feats to speedup L1 regression models. For sparse data cosine will be used. It’s not exact, but ok for remove very high correlations.

__init__(corr_co=0.98, subsample=100000, random_state=42, **kwargs)[source]
Parameters
  • corr_co (float) – Similarity threshold.

  • subsample (Union[int, float]) – Number (int) of samples, or frac (float) from full dataset.

  • random_state (int) – Random seed for subsample.

  • **kwargs – Addtional parameters. Used for initialiation of parent class.

perform_selection(train_valid)[source]

Select features to save in dataset during selection.

Method is used to perform selection based on features correlation. Should save _selected_features attribute in the end of working.

Parameters

train_valid (Optional[TrainValidIterator]) – Classic cv-iterator.