SimpleRuTokenizer
- class lightautoml.text.tokenizer.SimpleRuTokenizer(n_jobs=4, to_string=True, stopwords=False, is_stemmer=True, **kwargs)[source]
Bases:
lightautoml.text.tokenizer.BaseTokenizer
Russian tokenizer.
- __init__(n_jobs=4, to_string=True, stopwords=False, is_stemmer=True, **kwargs)[source]
Tokenizer for Russian language.
Include numeric, punctuation and short word filtering. Use stemmer by default and do lowercase.