AutoNLPWrap
- class lightautoml.transformers.text.AutoNLPWrap(model_name, embedding_model=None, cache_dir='./cache_NLP', bert_model=None, transformer_params=None, subs=None, multigpu=False, random_state=42, train_fasttext=False, fasttext_params=None, fasttext_epochs=2, sent_scaler=None, verbose=False, device='0', **kwargs)[source]
Bases:
LAMLTransformerCalculate text embeddings.
- Parameters:
model_name (
str) – Method for aggregating word embeddings into sentence embedding.transformer_params (
Optional[Dict]) – Aggregating model parameters.embedding_model (
Optional[str]) – Word level embedding model with dict interface or path to gensim fasttext model.cache_dir (
str) – IfNone- do not cache transformed datasets.bert_model (
Optional[str]) – Name of HuggingFace transformer model.subs (
Optional[int]) – Subsample to calculate freqs. If None - full data.multigpu (
bool) – Use Data Parallel.random_state (
int) – Random state to take subsample.train_fasttext (
bool) – Train fasttext.fasttext_epochs (
int) – Number of epochs to train.verbose (
bool) – Verbosity.device (
Any) – Torch device or str.**kwargs (
Any) – Unused params.
- property features
Features list.
- fit(dataset)[source]
Fit chosen transformer and create feature names.
- Parameters:
dataset (
Union[NumpyDataset,PandasDataset]) – Pandas or Numpy dataset of text features.- Returns:
self.
- transform(dataset)[source]
Transform tokenized dataset to text embeddings.
- Parameters:
dataset (
Union[NumpyDataset,PandasDataset]) – Pandas or Numpy dataset of text features.- Return type:
- Returns:
Numpy dataset with text embeddings.