Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 84 additions & 2 deletions examples/tutorials/AATestTutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,27 @@
"res.resume"
]
},
{
"cell_type": "markdown",
"id": "55c32466",
"metadata": {},
"source": [
"**Interpretation of AA test results**\n",
"\n",
"Each row in the table corresponds to a target feature being tested for equality between the control and test groups. Two statistical tests are used:\n",
"\n",
"- **TTest**: tests if means are statistically different.\n",
"- **KSTest**: tests if distributions differ.\n",
"\n",
"The `OK` / `NOT OK` labels show whether the difference is statistically significant. A `NOT OK` result indicates a possible imbalance.\n",
"\n",
"Typical threshold:\n",
"- If p-value < 0.05 → `NOT OK` (statistically significant difference)\n",
"- If p-value ≥ 0.05 → `OK` (no significant difference)\n",
"\n",
"If any metric has a `NOT OK` status in the `AA test` column, it means at least one iteration showed significant difference.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
Expand Down Expand Up @@ -506,6 +527,21 @@
"res.aa_score"
]
},
{
"cell_type": "markdown",
"id": "eb0ce07b",
"metadata": {},
"source": [
"**Interpreting `aa_score`**\n",
"\n",
"This output shows p-values and the overall pass/fail status for each test type and feature. A high p-value (close to 1.0) means the test passed — the groups are similar.\n",
"\n",
"- `score`: p-value of the statistical test.\n",
"- `pass`: True if no iterations showed significant differences.\n",
"\n",
"Note: Even if the average p-value is high, the `pass` might still be False if at least one of the iterations had a p-value < 0.05.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
Expand Down Expand Up @@ -726,6 +762,18 @@
"res.best_split"
]
},
{
"cell_type": "markdown",
"id": "a225e982",
"metadata": {},
"source": [
"**About `best_split`**\n",
"\n",
"This shows the best found split of the dataset, where control and test groups are as similar as possible in terms of target metrics.\n",
"\n",
"You can use this split for future modeling or as a validation check before proceeding to actual experiments.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
Expand Down Expand Up @@ -824,6 +872,22 @@
"res.best_split_statistic"
]
},
{
"cell_type": "markdown",
"id": "ef1986ae",
"metadata": {},
"source": [
"**Understanding `best_split_statistic`**\n",
"\n",
"This table contains detailed statistics for the best (most balanced) split found across all iterations. You can compare:\n",
"\n",
"- Mean values in control vs test group.\n",
"- Absolute and relative differences.\n",
"- p-values for both tests.\n",
"\n",
"Ideally, all rows should have `OK` in both TTest and KSTest columns, and small difference values (<1%)."
]
},
{
"cell_type": "code",
"execution_count": 8,
Expand Down Expand Up @@ -2085,12 +2149,16 @@
"source": [
"# AA Test with stratification\n",
"\n",
"Depending on your requirements it is possible to stratify the data. You can set `stratification=True` and `StratificationRole` in `Dataset` to run it with stratification. "
"Depending on your requirements it is possible to stratify the data. You can set `stratification=True` and `StratificationRole` in `Dataset` to run it with stratification.\n",
"\n",
"Stratified AA tests ensure that both groups (control/test) have the same proportions of categories (e.g. same % of genders or regions). This prevents imbalances in categorical features that can distort results.\n",
"\n",
"Make sure to assign `StratificationRole` to relevant columns in your dataset before enabling stratification."
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"id": "da9ab2f374ce1273",
"metadata": {
"ExecuteTime": {
Expand Down Expand Up @@ -5337,6 +5405,20 @@
"source": [
"res.best_split_statistic"
]
},
{
"cell_type": "markdown",
"id": "d3dd84bc",
"metadata": {},
"source": [
"## Common issues and tips\n",
"\n",
"- **Missing roles**: Make sure all target variables are assigned `TargetRole`. Columns without roles may cause silent failure.\n",
"- **Stratification**: If your dataset contains categorical features (e.g. `gender`, `region`) that may affect the outcome, use `StratificationRole` and enable `stratification=True` in `AATest(...)`.\n",
"- **Imbalanced categories**: If some categories have too few samples, stratified splits may become unstable. Consider filtering or merging rare categories.\n",
"- **Random fluctuations**: On small datasets, it's normal to see occasional `NOT OK` results. Use more iterations (e.g. `n_iterations=50`) for stability.\n",
"- **Missing values**: NaNs in stratification columns may be treated as separate categories. Clean or fill missing values before stratified AA tests."
]
}
],
"metadata": {
Expand Down
Loading
Loading