TfidfTextTransformer

class lightautoml.transformers.text.TfidfTextTransformer(default_params=None, freeze_defaults=True, subs=None, random_state=42)[source]

Bases: lightautoml.transformers.text.TunableTransformer

Simple Tfidf vectorizer.

property features

Features list.

Return type

List[str]

__init__(default_params=None, freeze_defaults=True, subs=None, random_state=42)[source]
Parameters
  • default_params (Optional[dict]) – algo hyperparams.

  • freeze_defaults (bool) – Flag.

  • subs (Optional[int]) – Subsample to calculate freqs. If None - full data.

  • random_state (int) – Random state to take subsample.

Note

The behaviour of freeze_defaults:

  • True : params may be rewritten depending on dataset.

  • False: params may be changed only manually or with tuning.

init_params_on_input(dataset)[source]

Get transformer parameters depending on dataset parameters.

Parameters

dataset (Union[NumpyDataset, PandasDataset]) – Dataset used for model parmaeters initialization.

Return type

dict

Returns

Parameters of model.

fit(dataset)[source]

Fit tfidf vectorizer.

Parameters

dataset (Union[NumpyDataset, PandasDataset]) – Pandas or Numpy dataset of text features.

Returns

self.

transform(dataset)[source]

Transform text dataset to sparse tfidf representation.

Parameters

dataset (Union[NumpyDataset, PandasDataset]) – Pandas or Numpy dataset of text features.

Return type

CSRSparseDataset

Returns

Sparse dataset with encoded text.