TorchSimpleFeatures
- class lightautoml.pipelines.features.torch_pipeline.TorchSimpleFeatures(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]
Bases:
FeaturesPipeline,TabularDataFeaturesCreates simple pipeline for neural network models.
- __init__(use_te=False, top_intersections=5, max_bin_count=10, max_intersection_depth=3, te_subsample=None, sparse_ohe='auto', auto_unique_co=50, output_categories=True, multiclass_te_co=3, use_qnt=True, n_quantiles=None, subsample=1000000000, output_distribution='normal', noise=0.001, qnt_factor=30, **kwargs)[source]
TorchSimpleFeatures.
- Parameters:
use_qnt (
bool) – Use quantile transformer for numerical columns.n_quantiles (
Optional[int]) – Number of quantiles to be computed.subsample (
int) – Maximum number of samples used to estimate the quantiles for computational efficiency.output_distribution (
str) – Marginal distribution for the transformed data. The choices are ‘uniform’ or ‘normal’.noise (
float) – Add noise with certain std to dataset before quantile transformation to make data more smooth.qnt_factor (
int) – If number of quantiles is none then it equals dataset size / factoruse_te (
bool) – Use target encoding for categorical columns.top_intersections (
int) – Max number of categories to generate intersections.max_bin_count (
int) – Max number of bins for cat columns.max_intersection_depth (
int) – Max depth of cat intersection.te_subsample (
Union[float,int,None]) – Subsample to calc data statisticssparse_ohe (
Union[str,bool]) – Should we output sparse if ohe encoding was used during cat handling.auto_unique_co (
int) – Switch to target encoding if high cardinality.output_categories (
bool) – Output encoded categories or embed idxs.multiclass_te_co (
int) – Cutoff if use target encoding in cat handling on multiclass task if number of classes is high.kwargs – Other params.
- create_pipeline(train)[source]
Create tree pipeline.
- Parameters:
train (
Union[PandasDataset,NumpyDataset]) – Dataset with train features.- Return type:
- Returns:
Composite datetime, categorical, numeric transformer.