WhiteBoxPreset

class lightautoml.automl.presets.whitebox_presets.WhiteBoxPreset(task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, config_path=None, general_params=None, reader_params=None, read_csv_params=None, whitebox_params=None)[source]

Bases: lightautoml.automl.presets.base.AutoMLPreset

Preset for AutoWoE - logistic regression over binned features (scorecard).

Supported data roles - numbers, dates, categories.

Limitations:

  • Simple time management.

  • No memory management.

  • Working only with pandas.DataFrame.

  • No batch inference.

  • No text support.

  • No parallel execution.

  • No batch inference.

  • No GPU usage.

  • No cross-validation scheme. Supports only holdout validation (cv is created inside AutoWoE, but no oof pred returned).

Common usecase - fit lightweight interpretable model for binary classification task.

property whitebox

Get wrapped AutoWoE object.

Returns

Model.

__init__(task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, config_path=None, general_params=None, reader_params=None, read_csv_params=None, whitebox_params=None)[source]

Commonly _params kwargs (ex. timing_params) set via config file (config_path argument). If you need to change just few params, it’s possible to pass it as dict of dicts, like json. To get available params please look on default config template. Also you can find there param description To generate config template call WhiteBoxPreset.get_config('config_path.yml').

Parameters
  • task (Task) – Task to solve.

  • timeout (int) – Timeout in seconds.

  • memory_limit (int) – Memory limit that are passed to each automl.

  • cpu_limit (int) – CPU limit that that are passed to each automl.

  • gpu_ids (Optional[str]) – GPU IDs that are passed to each automl.

  • timing_params (Optional[dict]) – Timing param dict.

  • config_path (Optional[str]) – Path to config file.

  • general_params (Optional[dict]) – General param dict.

  • reader_params (Optional[dict]) – Reader param dict.

  • read_csv_params (Optional[dict]) – Params to pass pandas.read_csv (case of train/predict from file).

  • whitebox_params (Optional[dict]) – Params of WhiteBox algo (look at config file).

create_automl(*args, **kwargs)[source]

Create basic WhiteBoxPreset instance from data.

Parameters
  • *args – Not used.

  • **kwargs – everything passed to .fit_predict.

fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0, **fit_params)[source]

Fit and get prediction on validation dataset.

Almost same as lightautoml.automl.base.AutoML.fit_predict.

Additional features - working with different data formats. Supported now:

  • Path to .csv, .parquet, .feather files.

  • ndarray, or dict of ndarray. For example, {'data': X...}. In this case, roles are optional, but train_features and valid_features required.

  • pandas.DataFrame.

Parameters
  • train_data (Any) – Dataset to train.

  • roles (dict) – Roles dict.

  • train_features (Optional[Sequence[str]]) – Optional features names, if can’t be inferred from train_data.

  • cv_iter (Optional[Iterable]) – Custom cv-iterator. For example, TimeSeriesIterator.

  • valid_data (Optional[Any]) – Optional validation dataset.

  • valid_features (Optional[Sequence[str]]) – Optional validation dataset features if cannot be inferred from valid_data.

  • verbose (int) – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;

Return type

NumpyDataset

Returns

Dataset with predictions. Call .data to get predictions array.

predict(data, features_names=None, report=False)[source]

Almost same as AutoML .predict with additional features.

Additional features - generate extended WhiteBox report=True passed to args.

Parameters
  • data (Any) – Dataset to perform inference.

  • features_names (Optional[Sequence[str]]) – Optional features names, if can’t be inferred from train_data.

  • report (bool) – Flag if we need inner WhiteBox report update (True is slow). Only if general_params['report'] = True.

Return type

NumpyDataset

Returns

Dataset with predictions.