TabularAutoML
- class lightautoml.automl.presets.tabular_presets.TabularAutoML(task, timeout=3600, memory_limit=16, cpu_limit=4, gpu_ids='all', debug=False, timing_params=None, config_path=None, general_params=None, reader_params=None, read_csv_params=None, nested_cv_params=None, tuning_params=None, selection_params=None, lgb_params=None, cb_params=None, rf_params=None, linear_l2_params=None, nn_params=None, gbm_pipeline_params=None, linear_pipeline_params=None, nn_pipeline_params=None, time_series_pipeline_params=None, is_time_series=False)[source]
Bases:
AutoMLPreset
Classic preset - work with tabular data.
Supported data roles - numbers, dates, categories. Limitations:
No memory management
No text support
GPU support in catboost/lightgbm (if installed for GPU) training.
Commonly _params kwargs (ex. timing_params) set via config file (config_path argument). If you need to change just few params, it’s possible to pass it as dict of dicts, like json. To get available params please look on default config template. Also you can find there param description. To generate config template call
TabularAutoML.get_config('config_path.yml')
.- Parameters:
task (
Task
) – Task to solve.timeout (
int
) – Timeout in seconds.memory_limit (
int
) – Memory limit that are passed to each automl.cpu_limit (
int
) – CPU limit that that are passed to each automl.gpu_ids (
Optional
[str
]) – GPU IDs that are passed to each automl.debug (
bool
) – To catch running model exceptions or not.timing_params (
Optional
[dict
]) – Timing param dict. Optional.read_csv_params (
Optional
[dict
]) – Params to passpandas.read_csv
(case of train/predict from file).nested_cv_params (
Optional
[dict
]) – Param dict for nested cross-validation.selection_params (
Optional
[dict
]) – Params of feature selection.rf_params (
Optional
[dict
]) – Params of Sklearn Random Forest model.nn_params (
Optional
[dict
]) – Params of neural network model.gbm_pipeline_params (
Optional
[dict
]) – Params of feature generation for boosting models.linear_pipeline_params (
Optional
[dict
]) – Params of feature generation for linear models.nn_pipeline_params (
Optional
[dict
]) – Params of feature generation for neural network models.
- get_feature_pipeline(model, **kwargs)[source]
Get LGBSeqSimpleFeatures pipeline if task is the time series prediction.
- Parameters:
model – one from [“gbm”, “linear_l2”,, “rf”, “nn”].
kwargs – Arbitrary keyword arguments.
- Returns:
appropriate features pipeline.
- create_automl(**fit_args)[source]
Create basic automl instance.
- Parameters:
**fit_args – Contain all information needed for creating automl.
- fit_predict(train_data, roles=None, train_features=None, cv_iter=None, valid_data=None, valid_features=None, log_file=None, verbose=0)[source]
Fit and get prediction on validation dataset.
Almost same as
lightautoml.automl.base.AutoML.fit_predict
.Additional features - working with different data formats. Supported now:
- Parameters:
train_data (
Union
[str
,ndarray
,DataFrame
,Dict
[str
,ndarray
],Batch
]) – Dataset to train.train_features (
Optional
[Sequence
[str
]]) – Optional features names, if can’t be inferred from train_data.cv_iter (
Optional
[Iterable
]) – Custom cv-iterator. For example,TimeSeriesIterator
.valid_data (
Union
[str
,ndarray
,DataFrame
,Dict
[str
,ndarray
],Batch
,None
]) – Optional validation dataset.valid_features (
Optional
[Sequence
[str
]]) – Optional validation dataset features if cannot be inferred from valid_data.verbose (
int
) – Controls the verbosity: the higher, the more messages. <1 : messages are not displayed; >=1 : the computation process for layers is displayed; >=2 : the information about folds processing is also displayed; >=3 : the hyperparameters optimization process is also displayed; >=4 : the training process for every algorithm is displayed;log_file (
Optional
[str
]) – Filename for writing logging messages. If log_file is specified, the messages will be saved in a the file. If the file exists, it will be overwritten.
- Return type:
- Returns:
Dataset with predictions. Call
.data
to get predictions array.
- predict(data, features_names=None, batch_size=None, n_jobs=1, return_all_predictions=None)[source]
Get dataset with predictions.
Almost same as
lightautoml.automl.base.AutoML.predict
on new dataset, with additional features.Additional features - working with different data formats. Supported now:
Parallel inference - you can pass
n_jobs
to speedup prediction (requires more RAM). Batch_inference - you can passbatch_size
to decrease RAM usage (may be longer).- Parameters:
data (
Union
[str
,ndarray
,DataFrame
,Dict
[str
,ndarray
],Batch
]) – Dataset to perform inference.features_names (
Optional
[Sequence
[str
]]) – Optional features names, if cannot be inferred from train_data.return_all_predictions (
Optional
[bool
]) – if True, returns all model predictions from last level
- Return type:
- Returns:
Dataset with predictions.