AA test

An A/A test is a variation of an A/B test, the peculiarity of which is that the original is compared with itself, as opposed to an A/B test, which compares samples before and after exposure.

0. Import Libraries

[1]:
import numpy as np
import pandas as pd

from lightautoml.addons.hypex import AATest
from lightautoml.addons.hypex.utils.tutorial_data_creation import create_test_data

pd.options.display.float_format = '{:,.2f}'.format

np.random.seed(42)  # needed to create example data
[2]:
def show_result(result):
    for k, v in result.items():
        print(k)
        display(v)
        print()

1. Create or upload your dataset

In this case we will create random dataset with known effect size
If you have your own dataset, go to the part 2
[3]:
data = create_test_data(rs=52, na_step=10, nan_cols=['age', 'gender'])
data
[3]:
user_id signup_month treat pre_spends post_spends age gender industry
0 0 0 0 488.00 414.44 NaN M E-commerce
1 1 8 1 512.50 462.22 26.00 NaN E-commerce
2 2 7 1 483.00 479.44 25.00 M Logistics
3 3 0 0 501.50 424.33 39.00 M E-commerce
4 4 1 1 543.00 514.56 18.00 F E-commerce
... ... ... ... ... ... ... ... ...
9995 9995 10 1 538.50 450.44 42.00 M Logistics
9996 9996 0 0 500.50 430.89 26.00 F Logistics
9997 9997 3 1 473.00 534.11 22.00 F E-commerce
9998 9998 2 1 495.00 523.22 67.00 F E-commerce
9999 9999 7 1 508.00 475.89 38.00 F E-commerce

10000 rows × 8 columns

2. AATest

2.0 Initialize parameters

info_col used to define informative attributes that should NOT be part of testing, such as user_id and signup_month

[4]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

2.1 Simple AA-test

This is the easiest way to initialize and calculate metrics on a AA-test (default - on 2000 iterations)Use it when you are clear about each attribute or if you don’t have any additional task conditions (like grouping)

You can also add some extra arguments to the process():

  • plot_set - types of plot, that you want to show (“hist”, “cumulative”, “percentile”)

  • figsize - size of figure for plots

  • alpha - value to change the transparency of the histogram plot

  • bins - generic bin parameter that can be the name of a reference rule, the number of bins, or the breaks of the bins

  • title_size - size of title for plots

[5]:
experiment = AATest(info_cols=info_cols, target_fields=target)
[6]:
results = experiment.process(data, iterations=2000)
../../_images/pages_tutorials_Tutorial_12_AA_Test_11_1.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_11_2.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_11_3.png
[7]:
show_result(results)
experiments
random_state post_spends a mean post_spends b mean post_spends ab delta post_spends ab delta % post_spends t-test p-value post_spends ks-test p-value post_spends t-test passed post_spends ks-test passed pre_spends a mean ... pre_spends ks-test passed control % test % control size test size t-test mean p-value ks-test mean p-value t-test passed % ks-test passed % mean_tests_score
0 1 452.18 452.15 -0.03 -0.01 0.97 0.56 False False 487.33 ... False 50.00 50.00 5000 5000 0.59 0.49 0.00 0.00 0.52
1 2 452.82 451.50 -1.32 -0.29 0.09 0.18 False False 487.04 ... False 50.00 50.00 5000 5000 0.44 0.45 0.00 0.00 0.45
2 4 452.41 451.92 -0.50 -0.11 0.53 0.06 False False 487.20 ... False 50.00 50.00 5000 5000 0.56 0.36 0.00 0.00 0.43
3 5 452.64 451.69 -0.96 -0.21 0.23 0.41 False False 486.90 ... False 50.00 50.00 5000 5000 0.26 0.40 0.00 0.00 0.35
4 6 452.70 451.63 -1.07 -0.24 0.17 0.53 False False 487.31 ... False 50.00 50.00 5000 5000 0.21 0.35 0.00 0.00 0.30
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1755 1993 452.29 452.04 -0.24 -0.05 0.76 0.95 False False 486.96 ... False 50.00 50.00 5000 5000 0.61 0.71 0.00 0.00 0.68
1756 1994 452.56 451.77 -0.78 -0.17 0.32 0.11 False False 487.12 ... False 50.00 50.00 5000 5000 0.61 0.21 0.00 0.00 0.34
1757 1995 452.30 452.03 -0.26 -0.06 0.74 0.91 False False 486.94 ... False 50.00 50.00 5000 5000 0.57 0.89 0.00 0.00 0.79
1758 1996 451.89 452.44 0.55 0.12 0.48 0.78 False False 487.30 ... False 50.00 50.00 5000 5000 0.38 0.86 0.00 0.00 0.70
1759 1997 452.52 451.81 -0.70 -0.16 0.37 0.10 False False 487.10 ... False 50.00 50.00 5000 5000 0.66 0.40 0.00 0.00 0.49

1760 rows × 26 columns


aa_score
t-test passed score ks-test passed score t-test aa passed ks-test aa passed
post_spends 0.00 0.00 0.00 0.00
pre_spends 0.00 0.00 0.00 0.00
mean 0.00 0.00 0.00 0.00

split
user_id signup_month treat pre_spends post_spends age gender industry group
0 0 0 0 488.00 414.44 NaN M E-commerce test
1 1 8 1 512.50 462.22 26.00 NaN E-commerce test
2 4 1 1 543.00 514.56 18.00 F E-commerce test
3 5 6 1 486.50 486.56 44.00 M E-commerce test
4 8 4 1 465.50 506.00 66.00 M Logistics test
... ... ... ... ... ... ... ... ... ...
9995 9990 0 0 490.00 426.00 NaN M Logistics control
9996 9992 0 0 491.50 424.00 29.00 M E-commerce control
9997 9996 0 0 500.50 430.89 26.00 F Logistics control
9998 9997 3 1 473.00 534.11 22.00 F E-commerce control
9999 9998 2 1 495.00 523.22 67.00 F E-commerce control

10000 rows × 9 columns


best_experiment_stat
a mean b mean ab delta ab delta % t-test p-value ks-test p-value t-test passed ks-test passed
post_spends 452.22 452.11 -0.11 -0.02 0.89 1.00 False False
pre_spends 487.09 487.10 0.00 0.00 0.99 1.00 False False

split_stat
control %              50.00
test %                 50.00
control size            5000
test size               5000
t-test mean p-value     0.94
ks-test mean p-value    1.00
t-test passed %         0.00
ks-test passed %        0.00
mean_tests_score        0.98
Name: 60, dtype: object

resume
aa test passed split is uniform
post_spends not OK OK
pre_spends not OK OK

[8]:
results.keys()
[8]:
dict_keys(['experiments', 'aa_score', 'split', 'best_experiment_stat', 'split_stat', 'resume'])

results is a dictionary with dataframes as values.

  • ‘split’ - result of separation, column ‘group’ contains values ‘test’ and ‘control’

  • ‘resume’ - summary of all results

  • ‘aa_score’ - score of T-test and Kolmogorov-Smirnov test

  • ‘experiments’ - is a table of results of experiments, which includes

    • means of all targets in a and b samples,

    • p_values of Student t-test and test Kolmogorova-Smirnova,

    • and results of tests (did data on the random_state passes the uniform test)

  • ‘best_experiment_stat’ - like previous point but only for the best experiment

  • ‘split_stat’ - metrics and statistics tests for result of split

[9]:
results['aa_score']
[9]:
t-test passed score ks-test passed score t-test aa passed ks-test aa passed
post_spends 0.00 0.00 0.00 0.00
pre_spends 0.00 0.00 0.00 0.00
mean 0.00 0.00 0.00 0.00
[10]:
results['resume']
[10]:
aa test passed split is uniform
post_spends not OK OK
pre_spends not OK OK

2.2 Single experiment

To get stable results lets fix random_state

[11]:
random_state = 11

To perform single experiment you can use sampling_metrics()

[12]:
experiment = AATest(info_cols=info_cols, target_fields=target)
metrics, dict_of_datas = experiment.sampling_metrics(data=data, random_state=random_state).values()

The results contains the same info as in multisampling, but on one experiment

[13]:
metrics
[13]:
{'random_state': 11,
 'post_spends a mean': 451.8546,
 'post_spends b mean': 452.4745111111112,
 'post_spends ab delta': 0.6199111111112074,
 'post_spends ab delta %': 0.13700464797208323,
 'post_spends t-test p-value': 0.43154056610193947,
 'post_spends ks-test p-value': 0.95721723072851,
 'post_spends t-test passed': False,
 'post_spends ks-test passed': False,
 'pre_spends a mean': 487.2131,
 'pre_spends b mean': 486.9744,
 'pre_spends ab delta': -0.23869999999999436,
 'pre_spends ab delta %': -0.04901695037766718,
 'pre_spends t-test p-value': 0.5271083329122467,
 'pre_spends ks-test p-value': 0.14861030130677552,
 'pre_spends t-test passed': False,
 'pre_spends ks-test passed': False,
 'control %': 50.0,
 'test %': 50.0,
 'control size': 5000,
 'test size': 5000,
 't-test mean p-value': 0.4793244495070931,
 'ks-test mean p-value': 0.5529137660176427,
 't-test passed %': 0.0,
 'ks-test passed %': 0.0,
 'mean_tests_score': 0.5283839938474595}
[14]:
dict_of_datas[random_state]
[14]:
user_id signup_month treat pre_spends post_spends age gender industry group
0 1 8 1 512.50 462.22 26.00 NaN E-commerce test
1 2 7 1 483.00 479.44 25.00 M Logistics test
2 5 6 1 486.50 486.56 44.00 M E-commerce test
3 6 11 1 483.50 433.89 28.00 F Logistics test
4 11 4 1 498.50 516.89 58.00 NaN E-commerce test
... ... ... ... ... ... ... ... ... ...
9995 9986 0 0 494.00 432.11 39.00 M Logistics control
9996 9989 6 1 466.50 487.44 19.00 F E-commerce control
9997 9991 0 0 482.50 421.89 43.00 NaN Logistics control
9998 9995 10 1 538.50 450.44 42.00 M Logistics control
9999 9998 2 1 495.00 523.22 67.00 F E-commerce control

10000 rows × 9 columns

[15]:
results = experiment.experiment_result_transform(pd.Series(metrics))
[16]:
results.keys()
[16]:
dict_keys(['best_experiment_stat', 'best_split_stat'])
[17]:
results['best_experiment_stat']
[17]:
a mean b mean ab delta ab delta % t-test p-value ks-test p-value t-test passed ks-test passed
post_spends 451.85 452.47 0.62 0.14 0.43 0.96 False False
pre_spends 487.21 486.97 -0.24 -0.05 0.53 0.15 False False
[18]:
results['best_split_stat']
[18]:
control %              50.00
test %                 50.00
control size            5000
test size               5000
t-test mean p-value     0.48
ks-test mean p-value    0.55
t-test passed %         0.00
ks-test passed %        0.00
mean_tests_score        0.53
dtype: object

2.3 AA-test with grouping

To perform experiment that separates samples by groups group_col can be used

[19]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

group_cols = 'industry'
[20]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)
[21]:
results = experiment.process(data=data, iterations=2000)
../../_images/pages_tutorials_Tutorial_12_AA_Test_32_1.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_32_2.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_32_3.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_32_4.png

The result is in the same format as without groups

In this regime groups equally divided on each sample (test and control):

[22]:
results['split']['industry'].value_counts(normalize=True) * 100
[22]:
industry
Logistics    50.15
E-commerce   49.85
Name: proportion, dtype: float64
[23]:
results['split'].groupby(['industry', 'group'])[['user_id']].count()
[23]:
user_id
industry group
E-commerce control 2493
test 2492
Logistics control 2508
test 2507
[24]:
show_result(results)
experiments
random_state post_spends a mean post_spends b mean post_spends ab delta post_spends ab delta % post_spends t-test p-value post_spends ks-test p-value post_spends t-test passed post_spends ks-test passed pre_spends a mean ... pre_spends ks-test passed control % test % control size test size t-test mean p-value ks-test mean p-value t-test passed % ks-test passed % mean_tests_score
0 0 451.53 452.80 1.27 0.28 0.11 0.19 False False 487.00 ... False 50.01 49.99 5001 4999 0.36 0.52 0.00 0.00 0.47
1 2 452.53 451.80 -0.73 -0.16 0.35 0.83 False False 487.19 ... False 50.01 49.99 5001 4999 0.47 0.92 0.00 0.00 0.77
2 3 452.10 452.23 0.13 0.03 0.87 0.85 False False 487.11 ... False 50.01 49.99 5001 4999 0.90 0.93 0.00 0.00 0.92
3 4 452.18 452.15 -0.03 -0.01 0.97 0.30 False False 487.20 ... False 50.01 49.99 5001 4999 0.77 0.47 0.00 0.00 0.57
4 7 452.38 451.95 -0.42 -0.09 0.59 0.40 False False 487.20 ... False 50.01 49.99 5001 4999 0.58 0.50 0.00 0.00 0.53
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1723 1995 452.70 451.63 -1.07 -0.24 0.18 0.14 False False 487.40 ... False 50.01 49.99 5001 4999 0.14 0.16 0.00 0.00 0.16
1724 1996 452.36 451.96 -0.40 -0.09 0.61 0.81 False False 487.08 ... False 50.01 49.99 5001 4999 0.78 0.69 0.00 0.00 0.72
1725 1997 452.08 452.25 0.18 0.04 0.82 0.59 False False 487.04 ... False 50.01 49.99 5001 4999 0.79 0.66 0.00 0.00 0.70
1726 1998 451.96 452.36 0.40 0.09 0.61 0.44 False False 486.85 ... False 50.01 49.99 5001 4999 0.41 0.54 0.00 0.00 0.50
1727 1999 452.38 451.95 -0.42 -0.09 0.59 0.54 False False 487.21 ... False 50.01 49.99 5001 4999 0.57 0.62 0.00 0.00 0.60

1728 rows × 26 columns


aa_score
t-test passed score ks-test passed score t-test aa passed ks-test aa passed
post_spends 0.00 0.00 0.00 0.00
pre_spends 0.00 0.00 0.00 0.00
mean 0.00 0.00 0.00 0.00

split
user_id signup_month treat pre_spends post_spends age gender industry group
0 0 0 0 488.00 414.44 NaN M E-commerce test
1 2 7 1 483.00 479.44 25.00 M Logistics test
2 4 1 1 543.00 514.56 18.00 F E-commerce test
3 5 6 1 486.50 486.56 44.00 M E-commerce test
4 7 11 1 496.00 432.89 57.00 M E-commerce test
... ... ... ... ... ... ... ... ... ...
9995 9983 0 0 494.50 428.33 31.00 F Logistics control
9996 9984 0 0 460.00 417.11 56.00 M Logistics control
9997 9985 0 0 484.00 411.33 52.00 M E-commerce control
9998 9991 0 0 482.50 421.89 43.00 NaN Logistics control
9999 9994 0 0 486.00 423.78 69.00 F Logistics control

10000 rows × 9 columns


best_experiment_stat
a mean b mean ab delta ab delta % t-test p-value ks-test p-value t-test passed ks-test passed
post_spends 452.18 452.15 -0.03 -0.01 0.97 0.99 False False
pre_spends 487.08 487.11 0.03 0.01 0.93 1.00 False False

split_stat
control %              50.01
test %                 49.99
control size            5001
test size               4999
t-test mean p-value     0.95
ks-test mean p-value    0.99
t-test passed %         0.00
ks-test passed %        0.00
mean_tests_score        0.98
Name: 1395, dtype: object

resume
aa test passed split is uniform
post_spends not OK OK
pre_spends not OK OK

2.4 AA with optimize group

If you have many columns for grouping and don’t know which colun or columns will make best result, you can use parametr ``optimize_group=True``. AA-Test will choose optimal number and names of group columns.

You can use columns_labeling to automatically name columns as target and group.

[25]:
experiment.columns_labeling(data)
[25]:
{'target_field': ['treat', 'pre_spends', 'post_spends', 'age'],
 'group_col': ['gender', 'industry']}
[26]:
results = experiment.process(data=data, optimize_groups=True, iterations=2000)
../../_images/pages_tutorials_Tutorial_12_AA_Test_40_1.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_40_2.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_40_3.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_40_4.png
[27]:
experiment.group_cols
[27]:
['industry']
[28]:
show_result(results)
experiments
random_state post_spends a mean post_spends b mean post_spends ab delta post_spends ab delta % post_spends t-test p-value post_spends ks-test p-value post_spends t-test passed post_spends ks-test passed pre_spends a mean ... pre_spends ks-test passed control % test % control size test size t-test mean p-value ks-test mean p-value t-test passed % ks-test passed % mean_tests_score
0 0 452.60 451.72 -0.88 -0.19 0.26 0.48 False False 487.38 ... True 50.00 50.00 5000 5000 0.20 0.25 0.00 50.00 0.23
1 1 452.18 452.15 -0.03 -0.01 0.97 0.56 False False 487.33 ... False 50.00 50.00 5000 5000 0.59 0.49 0.00 0.00 0.52
2 2 452.82 451.50 -1.32 -0.29 0.09 0.18 False False 487.04 ... False 50.00 50.00 5000 5000 0.44 0.45 0.00 0.00 0.45
3 3 451.25 453.08 1.83 0.40 0.02 0.08 True False 486.67 ... False 50.00 50.00 5000 5000 0.02 0.13 100.00 0.00 0.09
4 4 452.41 451.92 -0.50 -0.11 0.53 0.06 False False 487.20 ... False 50.00 50.00 5000 5000 0.56 0.36 0.00 0.00 0.43
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1995 1995 452.30 452.03 -0.26 -0.06 0.74 0.91 False False 486.94 ... False 50.00 50.00 5000 5000 0.57 0.89 0.00 0.00 0.79
1996 1996 451.89 452.44 0.55 0.12 0.48 0.78 False False 487.30 ... False 50.00 50.00 5000 5000 0.38 0.86 0.00 0.00 0.70
1997 1997 452.52 451.81 -0.70 -0.16 0.37 0.10 False False 487.10 ... False 50.00 50.00 5000 5000 0.66 0.40 0.00 0.00 0.49
1998 1998 452.27 452.06 -0.21 -0.05 0.79 0.86 False False 486.73 ... True 50.00 50.00 5000 5000 0.42 0.45 0.00 50.00 0.44
1999 1999 451.47 452.86 1.38 0.31 0.08 0.02 False True 486.75 ... False 50.00 50.00 5000 5000 0.07 0.36 0.00 50.00 0.26

2000 rows × 26 columns


aa_score
t-test passed score ks-test passed score t-test aa passed ks-test aa passed
post_spends 0.04 0.04 1.00 1.00
pre_spends 0.05 0.03 1.00 0.00
mean 0.04 0.04 1.00 0.50

split
user_id signup_month treat pre_spends post_spends age gender industry group
0 0 0 0 488.00 414.44 NaN M E-commerce test
1 1 8 1 512.50 462.22 26.00 NaN E-commerce test
2 4 1 1 543.00 514.56 18.00 F E-commerce test
3 5 6 1 486.50 486.56 44.00 M E-commerce test
4 8 4 1 465.50 506.00 66.00 M Logistics test
... ... ... ... ... ... ... ... ... ...
9995 9990 0 0 490.00 426.00 NaN M Logistics control
9996 9992 0 0 491.50 424.00 29.00 M E-commerce control
9997 9996 0 0 500.50 430.89 26.00 F Logistics control
9998 9997 3 1 473.00 534.11 22.00 F E-commerce control
9999 9998 2 1 495.00 523.22 67.00 F E-commerce control

10000 rows × 9 columns


best_experiment_stat
a mean b mean ab delta ab delta % t-test p-value ks-test p-value t-test passed ks-test passed
post_spends 452.22 452.11 -0.11 -0.02 0.89 1.00 False False
pre_spends 487.09 487.10 0.00 0.00 0.99 1.00 False False

split_stat
control %              50.00
test %                 50.00
control size            5000
test size               5000
t-test mean p-value     0.94
ks-test mean p-value    1.00
t-test passed %         0.00
ks-test passed %        0.00
mean_tests_score        0.98
Name: 69, dtype: object

resume
aa test passed split is uniform
post_spends OK OK
pre_spends OK OK

2.5 AA test with quantization

If you want make one column as parameter for quantization, you may use ``quant_field``.

[29]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

group_cols = 'industry'
quant_field = 'gender'
[30]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols, quant_field=quant_field)
[31]:
result = experiment.process(data=data, iterations=2000)
../../_images/pages_tutorials_Tutorial_12_AA_Test_46_1.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_46_2.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_46_3.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_46_4.png
[32]:
result['split'].groupby(['gender', 'industry', 'group'])['user_id'].count()

[32]:
gender  industry    group
F       E-commerce  test       2261
        Logistics   control    2305
M       E-commerce  control    2240
        Logistics   control    2194
Name: user_id, dtype: int64
[33]:
show_result(result)
experiments
random_state post_spends a mean post_spends b mean post_spends ab delta post_spends ab delta % post_spends t-test p-value post_spends ks-test p-value post_spends t-test passed post_spends ks-test passed pre_spends a mean ... pre_spends ks-test passed control % test % control size test size t-test mean p-value ks-test mean p-value t-test passed % ks-test passed % mean_tests_score
0 0 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
1 2 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
2 7 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
3 8 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
4 15 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
643 1981 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
644 1982 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
645 1984 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
646 1988 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53
647 1998 452.05 452.46 0.41 0.09 0.64 0.91 False False 486.90 ... False 72.23 27.77 7223 2777 0.37 0.62 0.00 0.00 0.53

648 rows × 26 columns


aa_score
t-test passed score ks-test passed score t-test aa passed ks-test aa passed
post_spends 0.00 0.00 0.00 0.00
pre_spends 0.00 0.00 0.00 0.00
mean 0.00 0.00 0.00 0.00

split
user_id signup_month treat pre_spends post_spends age gender industry group
0 8193 0 0 494.50 427.11 40.00 F E-commerce test
1 8195 0 0 494.00 416.22 41.00 F E-commerce test
2 4 1 1 543.00 514.56 18.00 F E-commerce test
3 8203 0 0 472.50 412.67 52.00 F E-commerce test
4 8205 0 0 460.00 408.22 66.00 F E-commerce test
... ... ... ... ... ... ... ... ... ...
9995 9990 0 0 490.00 426.00 NaN M Logistics control
9996 9992 0 0 491.50 424.00 29.00 M E-commerce control
9997 9994 0 0 486.00 423.78 69.00 F Logistics control
9998 9995 10 1 538.50 450.44 42.00 M Logistics control
9999 9996 0 0 500.50 430.89 26.00 F Logistics control

10000 rows × 9 columns


best_experiment_stat
a mean b mean ab delta ab delta % t-test p-value ks-test p-value t-test passed ks-test passed
post_spends 452.05 452.46 0.41 0.09 0.64 0.91 False False
pre_spends 486.90 487.60 0.71 0.15 0.09 0.32 False False

split_stat
control %              72.23
test %                 27.77
control size            7223
test size               2777
t-test mean p-value     0.37
ks-test mean p-value    0.62
t-test passed %         0.00
ks-test passed %        0.00
mean_tests_score        0.53
Name: 0, dtype: object

resume
aa test passed split is uniform
post_spends not OK OK
pre_spends not OK OK

2.6 Unbalanced AA test

If you want to perform AA test with unbalanced groups, you can use parametr ``test_size`` to define sizes of test group and control group

[34]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

group_cols = 'industry'
[35]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)
[36]:
result = experiment.process(data=data, test_size=0.3, iterations=2000)
../../_images/pages_tutorials_Tutorial_12_AA_Test_52_1.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_52_2.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_52_3.png
../../_images/pages_tutorials_Tutorial_12_AA_Test_52_4.png
[37]:
result['split']['group'].value_counts(normalize=True)
[37]:
group
control   0.70
test      0.30
Name: proportion, dtype: float64
[38]:
show_result(result)
experiments
random_state post_spends a mean post_spends b mean post_spends ab delta post_spends ab delta % post_spends t-test p-value post_spends ks-test p-value post_spends t-test passed post_spends ks-test passed pre_spends a mean ... pre_spends ks-test passed control % test % control size test size t-test mean p-value ks-test mean p-value t-test passed % ks-test passed % mean_tests_score
0 1 451.99 452.57 0.57 0.13 0.50 0.76 False False 487.11 ... False 70.01 29.99 7001 2999 0.69 0.66 0.00 0.00 0.67
1 2 452.16 452.17 0.01 0.00 0.99 0.64 False False 487.00 ... False 70.01 29.99 7001 2999 0.73 0.72 0.00 0.00 0.72
2 3 452.22 452.03 -0.20 -0.04 0.82 0.59 False False 487.18 ... False 70.01 29.99 7001 2999 0.66 0.48 0.00 0.00 0.54
3 4 451.90 452.78 0.88 0.19 0.31 0.24 False False 487.13 ... False 70.01 29.99 7001 2999 0.54 0.60 0.00 0.00 0.58
4 5 452.64 451.05 -1.59 -0.35 0.06 0.14 False False 487.20 ... False 70.01 29.99 7001 2999 0.23 0.14 0.00 0.00 0.17
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1705 1995 452.12 452.28 0.16 0.04 0.85 0.63 False False 487.10 ... False 70.01 29.99 7001 2999 0.89 0.64 0.00 0.00 0.73
1706 1996 452.00 452.54 0.54 0.12 0.53 0.85 False False 486.94 ... False 70.01 29.99 7001 2999 0.37 0.72 0.00 0.00 0.60
1707 1997 452.29 451.88 -0.40 -0.09 0.64 0.60 False False 487.04 ... False 70.01 29.99 7001 2999 0.65 0.44 0.00 0.00 0.51
1708 1998 451.78 453.07 1.29 0.28 0.13 0.33 False False 487.12 ... False 70.01 29.99 7001 2999 0.47 0.66 0.00 0.00 0.60
1709 1999 452.25 451.96 -0.29 -0.06 0.74 0.77 False False 487.17 ... False 70.01 29.99 7001 2999 0.64 0.71 0.00 0.00 0.69

1710 rows × 26 columns


aa_score
t-test passed score ks-test passed score t-test aa passed ks-test aa passed
post_spends 0.00 0.00 0.00 0.00
pre_spends 0.00 0.00 0.00 0.00
mean 0.00 0.00 0.00 0.00

split
user_id signup_month treat pre_spends post_spends age gender industry group
0 0 0 0 488.00 414.44 NaN M E-commerce test
1 8193 0 0 494.50 427.11 40.00 F E-commerce test
2 8192 0 0 487.50 436.78 20.00 M E-commerce test
3 2 7 1 483.00 479.44 25.00 M Logistics test
4 8200 5 1 486.00 495.00 NaN M Logistics test
... ... ... ... ... ... ... ... ... ...
9995 9994 0 0 486.00 423.78 69.00 F Logistics control
9996 9995 10 1 538.50 450.44 42.00 M Logistics control
9997 9997 3 1 473.00 534.11 22.00 F E-commerce control
9998 9998 2 1 495.00 523.22 67.00 F E-commerce control
9999 9999 7 1 508.00 475.89 38.00 F E-commerce control

10000 rows × 9 columns


best_experiment_stat
a mean b mean ab delta ab delta % t-test p-value ks-test p-value t-test passed ks-test passed
post_spends 452.15 452.19 0.04 0.01 0.96 0.99 False False
pre_spends 487.09 487.09 -0.00 -0.00 1.00 0.99 False False

split_stat
control %              70.01
test %                 29.99
control size            7001
test size               2999
t-test mean p-value     0.98
ks-test mean p-value    0.99
t-test passed %         0.00
ks-test passed %        0.00
mean_tests_score        0.99
Name: 1472, dtype: object

resume
aa test passed split is uniform
post_spends not OK OK
pre_spends not OK OK

MDE

this is the boundary value of the effect, for which it makes sense to introduce some changes.

[39]:
info_cols = ['user_id', 'signup_month']
target = ['post_spends', 'pre_spends']

group_cols = 'industry'
mde_target = 'post_spends'
[40]:
experiment = AATest(info_cols=info_cols, target_fields=target, group_cols=group_cols)

Single experiment of data splitting for MDE calculation.

P.s. [None] is the number of random state. You can change it like sampling_metrics(data, random_state=42) and get result with [42] instead of [None]

[41]:
splitted_data = experiment.sampling_metrics(data)['data_from_experiment'][None]
splitted_data
[41]:
user_id signup_month treat pre_spends post_spends age gender industry group
0 0 0 0 488.00 414.44 NaN M E-commerce test
1 2 7 1 483.00 479.44 25.00 M Logistics test
2 5 6 1 486.50 486.56 44.00 M E-commerce test
3 7 11 1 496.00 432.89 57.00 M E-commerce test
4 9 4 1 470.00 512.11 54.00 M Logistics test
... ... ... ... ... ... ... ... ... ...
9995 9994 0 0 486.00 423.78 69.00 F Logistics control
9996 9995 10 1 538.50 450.44 42.00 M Logistics control
9997 9996 0 0 500.50 430.89 26.00 F Logistics control
9998 9997 3 1 473.00 534.11 22.00 F E-commerce control
9999 9999 7 1 508.00 475.89 38.00 F E-commerce control

10000 rows × 9 columns

[42]:
splitted_data[mde_target].hist()
[42]:
<Axes: >
../../_images/pages_tutorials_Tutorial_12_AA_Test_60_1.png

You can evaluate minimum detectable effect for your data. This will be the smallest true effect obtained from the changes, which the statistical criterion will be able to detect with confidence

[43]:
mde = experiment.calc_mde(data=splitted_data, group_field="group", target_field=mde_target)
mde
[43]:
(0.88, 0.02)

You can also calculate the amount of data you need to have in order to determine the minimum effect of the test.

[44]:
experiment.calc_sample_size(data=splitted_data, target_field=mde_target, mde=5)
[44]:
1949.4372012485414

Chi2 Test

[45]:
target = ['post_spends', 'pre_spends']
treated_field = 'treat'
[46]:
experiment = AATest(target_fields=target)
[47]:
experiment.calc_chi2(data, treated_field)
[47]:
{'post_spends': 4.2708618195357307e-129, 'pre_spends': 0.3904626181767134}