TimeUtilization

class lightautoml.addons.utilization.utilization.TimeUtilization(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]

Bases: object

Class that helps to utilize given time to AutoMLPreset.

Useful to calc benchmarks and compete It takes list of config files as input and run it white time limit exceeded. If time left - it can perform multistart on same configs with new random state. In best case - blend different configurations of single preset. In worst case - averaging multiple automl’s with different states.

Note

Basic usage.

>>> ensembled_automl = TimeUtilization(TabularAutoML, Task('binary'),
>>>     timeout=3600, configs_list=['cfg0.yml', 'cfg1.yml'])

Then .fit_predict and predict can be called like usual AutoML class.

__init__(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]
Parameters
  • automl_factory (Type[AutoMLPreset]) – One of presets.

  • task (Task) – Task to solve.

  • timeout (int) – Timeout in seconds.

  • memory_limit (int) – Memory limit that are passed to each automl.

  • cpu_limit (int) – Cpu limit that that are passed to each automl.

  • gpu_ids (Optional[str]) – Gpu_ids that are passed to each automl.

  • verbose – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;

  • timing_params (Optional[dict]) – Timing_params level that are passed to each automl.

  • configs_list (Optional[Sequence[str]]) – List of str path to configs files.

  • inner_blend (Optional[Blender]) – Blender instance to blend automl’s with same configs and different random state.

  • outer_blend (Optional[Blender]) – Blender instance to blend averaged by random_state automl’s with different configs.

  • drop_last (bool) – Usually last automl will be stopped with timeout. Flag that defines if we should drop it from ensemble

  • return_all_predictions (bool) – Skip blend and return all model predictions

  • max_runs_per_config (int) – Maximum number of multistart loops.

  • random_state_keys (Optional[dict]) – Params of config that used as random state with initial values. If None - search for random_state key in default config of preset. If not found - assume, that seeds are not fixed and each run is random by default. For example {'reader_params': {'random_state': 42}, 'gbm_params': {'default_params': {'seed': 42}}}

  • random_state (int) – initial random seed, that will be set in case of search in config.

  • **kwargs – Additional params.

fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0, log_file=None)[source]

Fit and get prediction on validation dataset.

Almost same as lightautoml.automl.base.AutoML.fit_predict.

Additional features - working with different data formats. Supported now:

  • Path to .csv, .parquet, .feather files.

  • ndarray, or dict of ndarray. For example, {'data': X...}. In this case, roles are optional, but train_features and valid_features required.

  • pandas.DataFrame.

Parameters
Return type

LAMLDataset

Returns

Dataset with predictions. Call .data to get predictions array.

predict(data, features_names=None, return_all_predictions=None, **kwargs)[source]

Get dataset with predictions.

Almost same as lightautoml.automl.base.AutoML.predict on new dataset, with additional features.

Additional features - working with different data formats. Supported now:

  • Path to .csv, .parquet, .feather files.

  • ndarray, or dict of ndarray. For example, {'data': X...}. In this case roles are optional, but train_features and valid_features required.

  • pandas.DataFrame.

Parameters
  • data (Any) – Dataset to perform inference.

  • features_names (Optional[Sequence[str]]) – Optional features names, if cannot be inferred from train_data.

  • return_all_predictions (Optional[bool]) – bool - skip blending phase

Return type

LAMLDataset

Returns

Dataset with predictions.