TorchSimpleFeatures

class lightautoml.pipelines.features.torch_pipeline.TorchSimpleFeatures(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000.0, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]

Bases: FeaturesPipeline, TabularDataFeatures

Creates simple pipeline for neural network models.

__init__(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000.0, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]

TorchSimpleFeatures.

Parameters:
  • use_qnt (bool) – Use quantile transformer for numerical columns.

  • n_quantiles (Optional[int]) – Number of quantiles to be computed.

  • subsample (int) – Maximum number of samples used to estimate the quantiles for computational efficiency.

  • output_distribution (str) – Marginal distribution for the transformed data. The choices are ‘uniform’ or ‘normal’.

  • noise (float) – Add noise with certain std to dataset before quantile transformation to make data more smooth.

  • qnt_factor (int) – If number of quantiles is none then it equals dataset size / factor

  • use_te (bool) – Use target encoding for categorical columns.

  • top_intersections (int) – Max number of categories to generate intersections.

  • max_bin_count (int) – Max number of bins for cat columns.

  • max_intersection_depth (int) – Max depth of cat intersection.

  • te_subsample (Union[float, int, None]) – Subsample to calc data statistics

  • sparse_ohe (Union[str, bool]) – Should we output sparse if ohe encoding was used during cat handling.

  • auto_unique_co (int) – Switch to target encoding if high cardinality.

  • output_categories (bool) – Output encoded categories or embed idxs.

  • multiclass_te_co (int) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.

  • kwargs – Other params.

create_pipeline(train)[source]

Create tree pipeline.

Parameters:

train (Union[PandasDataset, NumpyDataset]) – Dataset with train features.

Return type:

LAMLTransformer

Returns:

Composite datetime, categorical, numeric transformer.