TokenizerTransformer

class lightautoml.transformers.text.TokenizerTransformer(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]

Bases: LAMLTransformer

Simple tokenizer transformer.

Parameters:

tokenizer (BaseTokenizer) – text tokenizer.

transform(dataset)[source]

Transform text dataset to tokenized text dataset.

Parameters:

dataset (Union[NumpyDataset, PandasDataset]) – Pandas or Numpy dataset of text features.

Return type:

PandasDataset

Returns:

Pandas dataset with tokenized text.