AB test

0. Import libraries

[1]:
import numpy as np
import pandas as pd

from lightautoml.addons.hypex import ABTest
from lightautoml.addons.hypex.utils.tutorial_data_creation import create_test_data

pd.options.display.float_format = '{:,.2f}'.format

np.random.seed(42)  # needed to create example data

1. Create or upload your dataset

In this case we will create random dataset with known effect size
If you have your own dataset, go to the part 2
[2]:
data = create_test_data(num_users=10000, rs=52, na_step=10, nan_cols=['age', 'gender'])
data
[2]:
user_id signup_month treat pre_spends post_spends age gender industry
0 0 0 0 488.00 414.44 NaN M E-commerce
1 1 8 1 512.50 462.22 26.00 NaN E-commerce
2 2 7 1 483.00 479.44 25.00 M Logistics
3 3 0 0 501.50 424.33 39.00 M E-commerce
4 4 1 1 543.00 514.56 18.00 F E-commerce
... ... ... ... ... ... ... ... ...
9995 9995 10 1 538.50 450.44 42.00 M Logistics
9996 9996 0 0 500.50 430.89 26.00 F Logistics
9997 9997 3 1 473.00 534.11 22.00 F E-commerce
9998 9998 2 1 495.00 523.22 67.00 F E-commerce
9999 9999 7 1 508.00 475.89 38.00 F E-commerce

10000 rows × 8 columns

2. AB-test

2.0 Data

Let’s correct data to see how AB-test works

[3]:
data_ab = data.copy()

half_data = int(data.shape[0] / 2)
data_ab['group'] = ['test'] * half_data + ['control'] * half_data
data_ab.head(3)
[3]:
user_id signup_month treat pre_spends post_spends age gender industry group
0 0 0 0 488.00 414.44 NaN M E-commerce test
1 1 8 1 512.50 462.22 26.00 NaN E-commerce test
2 2 7 1 483.00 479.44 25.00 M Logistics test

3.1 Full AB-test

Full (basic) version of test includes calculation of all available metrics, which are: “diff in means”, “diff in diff” and “cuped” Pay attention, that for “cuped” and “diff in diff” metrics required target before pilot.

[4]:
model = ABTest()
results = model.execute(
    data=data_ab,
    target_field='post_spends',
    target_field_before='pre_spends',
    group_field='group'
)
results
[4]:
{'size': {'test': 5000, 'control': 5000},
 'difference': {'ate': 1.108044444444488,
  'medain_diff': 0.16666666666668561,
  'cuped': 0.897496915890514,
  'diff_in_diff': 0.610344444444479},
 'p-value': {'t-test': 0.15973563889393272,
  'mann_whitney': 0.11494755666097989}}

2.2 Simple AB-test

To estimate effect without target data before pilot calc_difference_method='ate' can be used - effect will be estimated with “diff in means” method

[5]:
model = ABTest(calc_difference_method='ate')
model.execute(data=data_ab, target_field='post_spends', group_field='group')
[5]:
{'size': {'test': 5000, 'control': 5000},
 'difference': {'ate': 1.108044444444488},
 'p-value': {'t-test': 0.15973563889393272,
  'mann_whitney': 0.11494755666097989}}