TorchSimpleFeatures
- class lightautoml.pipelines.features.torch_pipeline.TorchSimpleFeatures(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000.0, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]
Bases:
FeaturesPipeline
,TabularDataFeatures
Creates simple pipeline for neural network models.
- __init__(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000.0, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]
TorchSimpleFeatures.
- Parameters:
use_qnt (
bool
) – Use quantile transformer for numerical columns.n_quantiles (
Optional
[int
]) – Number of quantiles to be computed.subsample (
int
) – Maximum number of samples used to estimate the quantiles for computational efficiency.output_distribution (
str
) – Marginal distribution for the transformed data. The choices are ‘uniform’ or ‘normal’.noise (
float
) – Add noise with certain std to dataset before quantile transformation to make data more smooth.qnt_factor (
int
) – If number of quantiles is none then it equals dataset size / factoruse_te (
bool
) – Use target encoding for categorical columns.top_intersections (
int
) – Max number of categories to generate intersections.max_bin_count (
int
) – Max number of bins for cat columns.max_intersection_depth (
int
) – Max depth of cat intersection.te_subsample (
Union
[float
,int
,None
]) – Subsample to calc data statisticssparse_ohe (
Union
[str
,bool
]) – Should we output sparse if ohe encoding was used during cat handling.auto_unique_co (
int
) – Switch to target encoding if high cardinality.output_categories (
bool
) – Output encoded categories or embed idxs.multiclass_te_co (
int
) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.kwargs – Other params.
- create_pipeline(train)[source]
Create tree pipeline.
- Parameters:
train (
Union
[PandasDataset
,NumpyDataset
]) – Dataset with train features.- Return type:
- Returns:
Composite datetime, categorical, numeric transformer.