TimeUtilization
- class lightautoml.addons.utilization.utilization.TimeUtilization(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]
Bases:
object
Class that helps to utilize given time to
AutoMLPreset
.Useful to calc benchmarks and compete It takes list of config files as input and run it white time limit exceeded. If time left - it can perform multistart on same configs with new random state. In best case - blend different configurations of single preset. In worst case - averaging multiple automl’s with different states.
Note
Basic usage.
>>> ensembled_automl = TimeUtilization(TabularAutoML, Task('binary'), >>> timeout=3600, configs_list=['cfg0.yml', 'cfg1.yml'])
Then
.fit_predict
and predict can be called like usualAutoML
class.- __init__(automl_factory, task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids=None, timing_params=None, configs_list=None, inner_blend=None, outer_blend=None, drop_last=True, return_all_predictions=False, max_runs_per_config=5, random_state_keys=None, random_state=42, **kwargs)[source]
- Parameters
automl_factory (
Type
[AutoMLPreset
]) – One of presets.task (
Task
) – Task to solve.timeout (
int
) – Timeout in seconds.memory_limit (
int
) – Memory limit that are passed to each automl.cpu_limit (
int
) – Cpu limit that that are passed to each automl.gpu_ids (
Optional
[str
]) – Gpu_ids that are passed to each automl.verbose – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;
timing_params (
Optional
[dict
]) – Timing_params level that are passed to each automl.configs_list (
Optional
[Sequence
[str
]]) – List of str path to configs files.inner_blend (
Optional
[Blender
]) – Blender instance to blend automl’s with same configs and different random state.outer_blend (
Optional
[Blender
]) – Blender instance to blend averaged by random_state automl’s with different configs.drop_last (
bool
) – Usually last automl will be stopped with timeout. Flag that defines if we should drop it from ensemblereturn_all_predictions (
bool
) – Skip blend and return all model predictionsmax_runs_per_config (
int
) – Maximum number of multistart loops.random_state_keys (
Optional
[dict
]) – Params of config that used as random state with initial values. IfNone
- search for random_state key in default config of preset. If not found - assume, that seeds are not fixed and each run is random by default. For example{'reader_params': {'random_state': 42}, 'gbm_params': {'default_params': {'seed': 42}}}
random_state (
int
) – initial random seed, that will be set in case of search in config.**kwargs – Additional params.
- fit_predict(train_data, roles, train_features=None, cv_iter=None, valid_data=None, valid_features=None, verbose=0, log_file=None)[source]
Fit and get prediction on validation dataset.
Almost same as
lightautoml.automl.base.AutoML.fit_predict
.Additional features - working with different data formats. Supported now:
- Parameters
train_data (
Any
) – Dataset to train.roles (
dict
) – Roles dict.train_features (
Optional
[Sequence
[str
]]) – Optional features names, if can’t be inferred from train_data.cv_iter (
Optional
[Iterable
]) – Custom cv-iterator. For example,TimeSeriesIterator
.valid_features (
Optional
[Sequence
[str
]]) – Optional validation dataset features if cannot be inferred from valid_data.
- Return type
- Returns
Dataset with predictions. Call
.data
to get predictions array.
- predict(data, features_names=None, return_all_predictions=None, **kwargs)[source]
Get dataset with predictions.
Almost same as
lightautoml.automl.base.AutoML.predict
on new dataset, with additional features.Additional features - working with different data formats. Supported now: