lightautoml.text
Provides an internal interface for working with text features.
Sentence Embedders
Deep Learning based sentence embeddings. |
|
Class to compute Bag of Random Embedding Projections sentence embeddings from words embeddings. |
|
Class to compute Random LSTM sentence embeddings from words embeddings. |
|
Class to compute HuggingFace transformers words or sentence embeddings. |
|
Weighted average of word embeddings. |
Torch Datasets for Text
Dataset class with transformers tokenization. |
|
Dataset class for extracting word embeddings. |
Tokenizers
Base class for tokenizer method. |
|
Russian tokenizer. |
|
English tokenizer. |
Pooling Strategies
Abstract pooling class. |
|
CLS token pooling. |
|
Max value pooling. |
|
Sum value pooling. |
|
Mean value pooling. |
|
Identity pooling. |
Utils
Set random seed and cudnn params. |
|
Parse devices and convert first to the torch device. |
|
Puts each data field into a tensor with outer dimension batch size. |
|
Get text hash. |
|
Get hash of array with texts. |