TorchSimpleFeatures

class lightautoml.pipelines.features.torch_pipeline.TorchSimpleFeatures(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000.0, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]

Bases: FeaturesPipeline, TabularDataFeatures

Creates simple pipeline for neural network models.

__init__(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000.0, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]

TorchSimpleFeatures.

Parameters:

use_qnt (bool) – Use quantile transformer for numerical columns.
n_quantiles (Optional[int]) – Number of quantiles to be computed.
subsample (int) – Maximum number of samples used to estimate the quantiles for computational efficiency.
output_distribution (str) – Marginal distribution for the transformed data. The choices are ‘uniform’ or ‘normal’.
noise (float) – Add noise with certain std to dataset before quantile transformation to make data more smooth.
qnt_factor (int) – If number of quantiles is none then it equals dataset size / factor
use_te (bool) – Use target encoding for categorical columns.
top_intersections (int) – Max number of categories to generate intersections.
max_bin_count (int) – Max number of bins for cat columns.
max_intersection_depth (int) – Max depth of cat intersection.
te_subsample (Union[float, int, None]) – Subsample to calc data statistics
sparse_ohe (Union[str, bool]) – Should we output sparse if ohe encoding was used during cat handling.
auto_unique_co (int) – Switch to target encoding if high cardinality.
output_categories (bool) – Output encoded categories or embed idxs.
multiclass_te_co (int) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.
kwargs – Other params.

create_pipeline(train)[source]

Create tree pipeline.

Parameters:: train (Union[PandasDataset, NumpyDataset]) – Dataset with train features.
Return type:: LAMLTransformer
Returns:: Composite datetime, categorical, numeric transformer.