TokenizerTransformer
- class lightautoml.transformers.text.TokenizerTransformer(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]
Bases:
LAMLTransformer
Simple tokenizer transformer.
- Parameters:
tokenizer (
BaseTokenizer
) – text tokenizer.
- transform(dataset)[source]
Transform text dataset to tokenized text dataset.
- Parameters:
dataset (
Union
[NumpyDataset
,PandasDataset
]) – Pandas or Numpy dataset of text features.- Return type:
- Returns:
Pandas dataset with tokenized text.