TimeUtilization
- class lightautoml.addons.utilization.utilization.TimeUtilization(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]
Bases:
objectClass that helps to utilize given time to
AutoMLPreset.Useful to calc benchmarks and compete It takes a list of config files as input and runs it until a time limit is exceeded. If time left - it can perform multistart on same configs with new random state. In best case - blend different configurations of single preset. In worst case - averaging multiple automl’s with different states.
Note
Basic usage.
>>> ensembled_automl = TimeUtilization(TabularAutoML, Task('binary'), >>> timeout=3600, configs_list=['cfg0.yml', 'cfg1.yml'])
Then
.fit_predictand predict can be called like usualAutoMLclass.- Parameters:
automl_factory (
Type[AutoMLPreset]) – One of presets.task (
Task) – Task to solve.timeout (
int) – Timeout in seconds.memory_limit (
int) – Memory limit that are passed to each automl.cpu_limit (
int) – Cpu limit that that are passed to each automl.gpu_ids (
Optional[str]) – Gpu_ids that are passed to each automl.verbose – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;
timing_params (
Optional[dict]) – Timing_params level that are passed to each automl.configs_list (
Optional[Sequence[str]]) – List of str path to configs files.inner_blend (
Optional[Blender]) – Blender instance to blend automl’s with same configs and different random state.outer_blend (
Optional[Blender]) – Blender instance to blend averaged by random_state automl’s with different configs.drop_last (
bool) – Usually last automl will be stopped with timeout. Flag that defines if we should drop it from ensemblereturn_all_predictions (
bool) – Skip blend and return all model predictionsmax_runs_per_config (
int) – Maximum number of multistart loops.random_state_keys (
Optional[dict]) – Params of config that used as random state with initial values. IfNone- search for random_state key in default config of preset. If not found - assume, that seeds are not fixed and each run is random by default. For example{'reader_params': {'random_state': 42}, 'gbm_params': {'default_params': {'seed': 42}}}random_state (
int) – initial random seed, that will be set in case of search in config.**kwargs – Additional params.
- fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0, log_file=None, path_to_save=None)[source]
Fit and get prediction on validation dataset.
Almost same as
lightautoml.automl.base.AutoML.fit_predict.Additional features - working with different data formats. Supported now:
- Parameters:
train_data (
Any) – Dataset to train.roles (
dict) – Roles dict.train_features (
Optional[Sequence[str]]) – Optional features names, if can’t be inferred from train_data.cv_iter (
Optional[Iterable]) – Custom cv-iterator. For example,TimeSeriesIterator.valid_features (
Optional[Sequence[str]]) – Optional validation dataset features if cannot be inferred from valid_data.verbose (
int) – Verbose.path_to_save (
Optional[str]) – The path that joblib will use to save the model after fit stage is completed. Use *.joblib format.
- Return type:
- Returns:
Dataset with predictions. Call
.datato get predictions array.
- predict(data, features_names=None, return_all_predictions=None, **kwargs)[source]
Get dataset with predictions.
Almost same as
lightautoml.automl.base.AutoML.predicton new dataset, with additional features.Additional features - working with different data formats. Supported now:
- Parameters:
- Return type:
- Returns:
Dataset with predictions.