EmbedDataset

class lightautoml.text.embed_dataset.EmbedDataset(sentences, embedding_model, max_length, embed_size, **kwargs)[source]

Bases: object

Dataset class for extracting word embeddings.

__init__(sentences, embedding_model, max_length, embed_size, **kwargs)[source]

Class for transforming list of tokens to dict of embeddings and sentence length.

Parameters
  • sentences (Sequence[str]) – List of tokenized sentences.

  • embedding_model (Dict) – word2vec, fasstext, etc. Should have dict interface {<word>: <embedding>}.

  • max_length (int) – Max sentence length.

  • embed_size (int) – Size of embedding.

  • **kwargs – Not used.