PandasToPandasReader
- class lightautoml.reader.base.PandasToPandasReader(task, samples=100000, max_nan_rate=0.999, max_constant_rate=0.999, cv=5, random_state=42, roles_params=None, n_jobs=4, advanced_roles=True, numeric_unique_rate=0.999, max_to_3rd_rate=1.1, binning_enc_rate=2, raw_decr_rate=1.1, max_score_rate=0.2, abs_score_val=0.04, drop_score_co=0.01, **kwargs)[source]
Bases:
Reader
Pandas Reader.
Reader to convert
DataFrame
to AutoML’sPandasDataset
. Stages:Drop obviously useless features.
Convert roles dict from user format to automl format.
Simple role guess for features without input role.
Create cv folds.
Create initial PandasDataset.
Optional: advanced guessing of role and handling types.
- Parameters:
task (
Task
) – Task object.samples (
Optional
[int
]) – Number of elements used when checking role type.max_nan_rate (
float
) – Maximum nan-rate.max_constant_rate (
float
) – Maximum constant rate.cv (
int
) – CV Folds.random_state (
int
) – Random seed.roles_params (
Optional
[dict
]) – dict of params of features roles. Ex. {‘numeric’: {‘dtype’: np.float32}, ‘datetime’: {‘date_format’: ‘%Y-%m-%d’}} It’s optional and commonly comes from confign_jobs (
int
) – Int number of processes.advanced_roles (
bool
) – Param of roles guess (experimental, do not change).numeric_unique_rate (
float
) – Param of roles guess (experimental, do not change).max_to_3rd_rate (
float
) – Param of roles guess (experimental, do not change).binning_enc_rate (
float
) – Param of roles guess (experimental, do not change).raw_decr_rate (
float
) – Param of roles guess (experimental, do not change).max_score_rate (
float
) – Param of roles guess (experimental, do not change).abs_score_val (
float
) – Param of roles guess (experimental, do not change).drop_score_co (
float
) – Param of roles guess (experimental, do not change).**kwargs (
Any
) – For now not used.
- fit_read(train_data, features_names=None, roles=None, **kwargs)[source]
Get dataset with initial feature selection.
- Parameters:
train_data (
DataFrame
) – Input data.features_names (
Optional
[Any
]) – Ignored. Just to keep signature.roles (
Optional
[Dict
[Union
[str
,TypeVar
(RoleType
, bound=ColumnRole
),None
],Sequence
[str
]]]) – Dict of features roles in format{RoleX: ['feat0', 'feat1', ...], RoleY: 'TARGET', ....}
.**kwargs (
Any
) – Can be used for target/group/weights.
- Return type:
- Returns:
Dataset with selected features.
- read(data, features_names=None, add_array_attrs=False)[source]
Read dataset with fitted metadata.
- Parameters:
- Return type:
- Returns:
Dataset with new columns.
- advanced_roles_guess(dataset, manual_roles=None)[source]
Advanced roles guess over user’s definition and reader’s simple guessing.
Strategy - compute feature’s NormalizedGini for different encoding ways and calc stats over results. Role is inferred by comparing performance stats with manual rules. Rule params are params of roles guess in init. Defaults are ok in general case.
- Parameters:
dataset (
PandasDataset
) – Input PandasDataset.manual_roles (
Optional
[Dict
[str
,TypeVar
(RoleType
, bound=ColumnRole
)]]) – Dict of user defined roles.
- Return type:
Dict
[str
,TypeVar
(RoleType
, bound=ColumnRole
)]- Returns:
Dict.