LabelEncoder

class lightautoml.transformers.categorical.LabelEncoder(subs=None, random_state=42)[source]

Bases: LAMLTransformer

Simple LabelEncoder in order of frequency.

Labels are integers from 1 to n. Unknown category encoded as 0. NaN is handled as a category value.

Parameters:
  • subs (Optional[int]) – Subsample to calculate freqs. If None - full data.

  • random_state (int) – Random state to take subsample.

fit(dataset)[source]

Estimate label frequencies and create encoding dicts.

Parameters:

dataset (Union[NumpyDataset, PandasDataset]) – Pandas or Numpy dataset of categorical features.

Returns:

self.

transform(dataset)[source]

Transform categorical dataset to int labels.

Parameters:

dataset (Union[NumpyDataset, PandasDataset]) – Pandas or Numpy dataset of categorical features.

Return type:

NumpyDataset

Returns:

Numpy dataset with encoded labels.