TimeUtilization

class lightautoml.addons.utilization.utilization.TimeUtilization(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]

Bases: object

Class that helps to utilize given time to AutoMLPreset.

Useful to calc benchmarks and compete It takes a list of config files as input and runs it until a time limit is exceeded. If time left - it can perform multistart on same configs with new random state. In best case - blend different configurations of single preset. In worst case - averaging multiple automl’s with different states.

Note

Basic usage.

>>> ensembled_automl = TimeUtilization(TabularAutoML, Task('binary'),
>>>     timeout=3600, configs_list=['cfg0.yml', 'cfg1.yml'])

Then .fit_predict and predict can be called like usual AutoML class.

Parameters:
  • automl_factory (Type[AutoMLPreset]) – One of presets.

  • task (Task) – Task to solve.

  • timeout (int) – Timeout in seconds.

  • memory_limit (int) – Memory limit that are passed to each automl.

  • cpu_limit (int) – Cpu limit that that are passed to each automl.

  • gpu_ids (Optional[str]) – Gpu_ids that are passed to each automl.

  • verbose – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;

  • timing_params (Optional[dict]) – Timing_params level that are passed to each automl.

  • configs_list (Optional[Sequence[str]]) – List of str path to configs files.

  • inner_blend (Optional[Blender]) – Blender instance to blend automl’s with same configs and different random state.

  • outer_blend (Optional[Blender]) – Blender instance to blend averaged by random_state automl’s with different configs.

  • drop_last (bool) – Usually last automl will be stopped with timeout. Flag that defines if we should drop it from ensemble

  • return_all_predictions (bool) – Skip blend and return all model predictions

  • max_runs_per_config (int) – Maximum number of multistart loops.

  • random_state_keys (Optional[dict]) – Params of config that used as random state with initial values. If None - search for random_state key in default config of preset. If not found - assume, that seeds are not fixed and each run is random by default. For example {'reader_params': {'random_state': 42}, 'gbm_params': {'default_params': {'seed': 42}}}

  • random_state (int) – initial random seed, that will be set in case of search in config.

  • **kwargs – Additional params.

fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0, log_file=None, path_to_save=None)[source]

Fit and get prediction on validation dataset.

Almost same as lightautoml.automl.base.AutoML.fit_predict.

Additional features - working with different data formats. Supported now:

  • Path to .csv, .parquet, .feather files.

  • ndarray, or dict of ndarray. For example, {'data': X...}. In this case, roles are optional, but train_features and valid_features required.

  • pandas.DataFrame.

Parameters:
  • train_data (Any) – Dataset to train.

  • roles (dict) – Roles dict.

  • train_features (Optional[Sequence[str]]) – Optional features names, if can’t be inferred from train_data.

  • cv_iter (Optional[Iterable]) – Custom cv-iterator. For example, TimeSeriesIterator.

  • valid_data (Optional[Any]) – Optional validation dataset.

  • valid_features (Optional[Sequence[str]]) – Optional validation dataset features if cannot be inferred from valid_data.

  • verbose (int) – Verbose.

  • log_file (Optional[str]) – Log filename.

  • path_to_save (Optional[str]) – The path that joblib will use to save the model after fit stage is completed. Use *.joblib format.

Return type:

LAMLDataset

Returns:

Dataset with predictions. Call .data to get predictions array.

predict(data, features_names=None, return_all_predictions=None, **kwargs)[source]

Get dataset with predictions.

Almost same as lightautoml.automl.base.AutoML.predict on new dataset, with additional features.

Additional features - working with different data formats. Supported now:

  • Path to .csv, .parquet, .feather files.

  • ndarray, or dict of ndarray. For example, {'data': X...}. In this case roles are optional, but train_features and valid_features required.

  • pandas.DataFrame.

Parameters:
  • data (Any) – Dataset to perform inference.

  • features_names (Optional[Sequence[str]]) – Optional features names, if cannot be inferred from train_data.

  • return_all_predictions (Optional[bool]) – bool - skip blending phase

  • **kwargs – Other params.

Return type:

LAMLDataset

Returns:

Dataset with predictions.