TokenizerTransformer
- class lightautoml.transformers.text.TokenizerTransformer(tokenizer=<lightautoml.text.tokenizer.SimpleEnTokenizer object>)[source]
Bases:
LAMLTransformerSimple tokenizer transformer.
- Parameters:
tokenizer (
BaseTokenizer) – text tokenizer.
- transform(dataset)[source]
Transform text dataset to tokenized text dataset.
- Parameters:
dataset (
Union[NumpyDataset,PandasDataset]) – Pandas or Numpy dataset of text features.- Return type:
- Returns:
Pandas dataset with tokenized text.