TokenizerTransformer

class lightautoml.transformers.text.TokenizerTransformer(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]

Bases: lightautoml.transformers.base.LAMLTransformer

Simple tokenizer transformer.

__init__(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]
Parameters

tokenizer (BaseTokenizer) – text tokenizer.

transform(dataset)[source]

Transform text dataset to tokenized text dataset.

Parameters

dataset (Union[NumpyDataset, PandasDataset]) – Pandas or Numpy dataset of text features.

Return type

PandasDataset

Returns

Pandas dataset with tokenized text.