AutoNLPWrap
- class lightautoml.transformers.text.AutoNLPWrap(model_name, embedding_model=None, cache_dir='./cache_NLP', bert_model=None, transformer_params=None, subs=None, multigpu=False, random_state=42, train_fasttext=False, fasttext_params=None, fasttext_epochs=2, sent_scaler=None, verbose=False, device='0', **kwargs)[source]
Bases:
lightautoml.transformers.base.LAMLTransformer
Calculate text embeddings.
- __init__(model_name, embedding_model=None, cache_dir='./cache_NLP', bert_model=None, transformer_params=None, subs=None, multigpu=False, random_state=42, train_fasttext=False, fasttext_params=None, fasttext_epochs=2, sent_scaler=None, verbose=False, device='0', **kwargs)[source]
- Parameters
model_name (
str
) – Method for aggregating word embeddings into sentence embedding.transformer_params (
Optional
[Dict
]) – Aggregating model parameters.embedding_model (
Optional
[str
]) – Word level embedding model with dict interface or path to gensim fasttext model.cache_dir (
str
) – IfNone
- do not cache transformed datasets.bert_model (
Optional
[str
]) – Name of HuggingFace transformer model.subs (
Optional
[int
]) – Subsample to calculate freqs. If None - full data.multigpu (
bool
) – Use Data Parallel.random_state (
int
) – Random state to take subsample.train_fasttext (
bool
) – Train fasttext.fasttext_epochs (
int
) – Number of epochs to train.verbose (
bool
) – Verbosity.device (
Any
) – Torch device or str.**kwargs – Unused params.
- fit(dataset)[source]
Fit chosen transformer and create feature names.
- Parameters
dataset (
Union
[NumpyDataset
,PandasDataset
]) – Pandas or Numpy dataset of text features.
- transform(dataset)[source]
Transform tokenized dataset to text embeddings.
- Parameters
dataset (
Union
[NumpyDataset
,PandasDataset
]) – Pandas or Numpy dataset of text features.- Return type
- Returns
Numpy dataset with text embeddings.