LinearFeatures

class lightautoml.pipelines.features.linear_pipeline.LinearFeatures(feats_imp=None, top_intersections=5, max_bin_count=10, max_intersection_depth=3, subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, **kwargs)[source]

Bases: lightautoml.pipelines.features.base.FeaturesPipeline, lightautoml.pipelines.features.base.TabularDataFeatures

Creates pipeline for linear models and nnets.

Includes:

  • Create categorical intersections.

  • OHE or embed idx encoding for categories.

  • Other cats to numbers ways if defined in role params.

  • Standartization and nan handling for numbers.

  • Numbers discretization if needed.

  • Dates handling.

  • Handling probs (output of lower level models).

__init__(feats_imp=None, top_intersections=5, max_bin_count=10, max_intersection_depth=3, subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, **kwargs)[source]
Parameters
  • feats_imp (Optional[ImportanceEstimator]) – Features importances mapping.

  • top_intersections (int) – Max number of categories to generate intersections.

  • max_bin_count (int) – Max number of bins to discretize numbers.

  • max_intersection_depth (int) – Max depth of cat intersection.

  • subsample (Union[float, int, None]) – Subsample to calc data statistics.

  • sparse_ohe (Union[str, bool]) – Should we output sparse if ohe encoding was used during cat handling.

  • auto_unique_co (int) – Switch to target encoding if high cardinality.

  • output_categories (bool) – Output encoded categories or embed idxs.

  • multiclass_te_co (int) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.

create_pipeline(train)[source]

Create linear pipeline.

Parameters

train (Union[PandasDataset, NumpyDataset]) – Dataset with train features.

Return type

LAMLTransformer

Returns

Transformer.