TimeUtilization

class lightautoml.addons.utilization.utilization.TimeUtilization(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]

Bases: object

Class that helps to utilize given time to AutoMLPreset.

Useful to calc benchmarks and compete It takes a list of config files as input and runs it until a time limit is exceeded. If time left - it can perform multistart on same configs with new random state. In best case - blend different configurations of single preset. In worst case - averaging multiple automl’s with different states.

Note

Basic usage.

>>> ensembled_automl = TimeUtilization(TabularAutoML, Task('binary'),
>>>     timeout=3600, configs_list=['cfg0.yml', 'cfg1.yml'])

Then .fit_predict and predict can be called like usual AutoML class.

Parameters:

automl_factory (Type[AutoMLPreset]) – One of presets.
task (Task) – Task to solve.
timeout (int) – Timeout in seconds.
memory_limit (int) – Memory limit that are passed to each automl.
cpu_limit (int) – Cpu limit that that are passed to each automl.
gpu_ids (Optional[str]) – Gpu_ids that are passed to each automl.
verbose – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;
timing_params (Optional[dict]) – Timing_params level that are passed to each automl.
configs_list (Optional[Sequence[str]]) – List of str path to configs files.
inner_blend (Optional[Blender]) – Blender instance to blend automl’s with same configs and different random state.
outer_blend (Optional[Blender]) – Blender instance to blend averaged by random_state automl’s with different configs.
drop_last (bool) – Usually last automl will be stopped with timeout. Flag that defines if we should drop it from ensemble
return_all_predictions (bool) – Skip blend and return all model predictions
max_runs_per_config (int) – Maximum number of multistart loops.
random_state_keys (Optional[dict]) – Params of config that used as random state with initial values. If None - search for random_state key in default config of preset. If not found - assume, that seeds are not fixed and each run is random by default. For example {'reader_params': {'random_state': 42}, 'gbm_params': {'default_params': {'seed': 42}}}
random_state (int) – initial random seed, that will be set in case of search in config.
**kwargs – Additional params.

fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0, log_file=None, path_to_save=None)[source]

Fit and get prediction on validation dataset.

Almost same as lightautoml.automl.base.AutoML.fit_predict.

Additional features - working with different data formats. Supported now:

Path to .csv, .parquet, .feather files.

ndarray, or dict of ndarray. For example, {'data': X...}. In this case, roles are optional, but train_features and valid_features required.

pandas.DataFrame.

Parameters:

train_data (Any) – Dataset to train.
roles (dict) – Roles dict.
train_features (Optional[Sequence[str]]) – Optional features names, if can’t be inferred from train_data.
cv_iter (Optional[Iterable]) – Custom cv-iterator. For example, TimeSeriesIterator.
valid_data (Optional[Any]) – Optional validation dataset.
valid_features (Optional[Sequence[str]]) – Optional validation dataset features if cannot be inferred from valid_data.
verbose (int) – Verbose.
log_file (Optional[str]) – Log filename.
path_to_save (Optional[str]) – The path that joblib will use to save the model after fit stage is completed. Use *.joblib format.

Return type:

LAMLDataset

Returns:

Dataset with predictions. Call .data to get predictions array.

predict(data, features_names=None, return_all_predictions=None, **kwargs)[source]

Get dataset with predictions.

Almost same as lightautoml.automl.base.AutoML.predict on new dataset, with additional features.

Additional features - working with different data formats. Supported now:

Path to .csv, .parquet, .feather files.

ndarray, or dict of ndarray. For example, {'data': X...}. In this case roles are optional, but train_features and valid_features required.

pandas.DataFrame.

Parameters:

data (Any) – Dataset to perform inference.
features_names (Optional[Sequence[str]]) – Optional features names, if cannot be inferred from train_data.
return_all_predictions (Optional[bool]) – bool - skip blending phase
**kwargs – Other params.

Return type:

LAMLDataset

Returns:

Dataset with predictions.