WeightedAverageTransformer

class lightautoml.text.weighted_average_transformer.WeightedAverageTransformer(embedding_model, embed_size, weight_type='idf', use_svd=True, alpha=0.001, verbose=False, **kwargs)[source]

Bases: sklearn.base.TransformerMixin

Weighted average of word embeddings.

__init__(embedding_model, embed_size, weight_type='idf', use_svd=True, alpha=0.001, verbose=False, **kwargs)[source]

Calculate sentence embedding as weighted average of word embeddings.

Parameters
  • embedding_model (Dict) – word2vec, fasstext, etc. Should have dict interface {<word>: <embedding>}.

  • embed_size (int) – Size of embedding.

  • weight_type (str) – ‘idf’ for idf weights, ‘sif’ for smoothed inverse frequency weights, ‘1’ for all weights are equal.

  • use_svd (bool) – Substract projection onto first singular vector.

  • alpha (int) – Param for sif weights.

  • verbose (bool) – Add prints.

  • **kwargs – Unused arguments.

get_name()[source]

Module name.

Return type

str

Returns

string with module name.

get_out_shape()[source]

Output shape.

Return type

int

Returns

Int with module output shape.

reset_statistic()[source]

Reset module statistics.

get_statistic()[source]

Get module statistics.