Contents
Bases: object
object
Dataset class for extracting word embeddings.
Class for transforming list of tokens to dict of embeddings and sentence length.
sentences (Sequence[str]) – List of tokenized sentences.
Sequence
str
embedding_model (Dict) – word2vec, fasstext, etc. Should have dict interface {<word>: <embedding>}.
Dict
max_length (int) – Max sentence length.
int
embed_size (int) – Size of embedding.
**kwargs – Not used.