Contents
Bases: BaseTokenizer
BaseTokenizer
English tokenizer.
n_jobs (int) – Number of threads for multiprocessing.
int
to_string (bool) – Return string or list of tokens.
bool
stopwords (Union[bool, Sequence[str], None]) – Use stopwords or not.
Union
Sequence
str
None
is_stemmer (bool) – Use stemmer.
kwargs (Any) – Ignore.
Any
Preprocess sentence string (lowercase, etc.).
snt (str) – Sentence string.
Resulting string.
Convert sentence string to a list of tokens.
List[str]
List
Resulting list of tokens.
Clean list of sentence tokens.
snt (List[str]) – List of tokens.
Resulting list of filtered tokens.
Additional processing steps: lemmatization, pos tagging, etc.
Resulting list of processed tokens.
Postprocess sentence string (merge words).