BertDataset

class lightautoml.text.embed_dataset.BertDataset(sentences, max_length, model_name, **kwargs)[source]

Bases: object

Dataset class with transformers tokenization.

Class for preparing transformers input.

Parameters:
  • sentences (Sequence[str]) – List of tokenized sentences.

  • max_length (int) – Max sentence length.

  • model_name (str) – Name of transformer model.

  • **kwargs (Any) – Other.