TokenizerTransformer
- class lightautoml.transformers.text.TokenizerTransformer(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]
Bases:
lightautoml.transformers.base.LAMLTransformer
Simple tokenizer transformer.
- __init__(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]
- Parameters
tokenizer (
BaseTokenizer
) – text tokenizer.
- transform(dataset)[source]
Transform text dataset to tokenized text dataset.
- Parameters
dataset (
Union
[NumpyDataset
,PandasDataset
]) – Pandas or Numpy dataset of text features.- Return type
- Returns
Pandas dataset with tokenized text.