Skip to content

Commit 2911596

Browse files
committed
revert(ci): undo unnecessary deepcopy in configure_split_db
The shallow copy was not the cause of dashboard test failures. The TEST dict doesn't exist at copy time, so shallow vs deep is identical. The real cause was missing order= on widget_1 and widget_2 fixtures.
1 parent 282f8ae commit 2911596

File tree

4 files changed

+37
-17
lines changed

4 files changed

+37
-17
lines changed

docs/investigations.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Investigations
2+
3+
## Dashboard Widget Ordering Failures Under xdist
4+
5+
### The symptom
6+
7+
5 out of 22 shards failed, all in `OrganizationDashboardDetailsPutTest`. Widget ID assertions mismatched:
8+
9+
```
10+
FAILED test_remove_widget_and_add_new - assert 41 == 44
11+
FAILED test_reorder_widgets_has_no_effect - assert 27 == 25
12+
```
13+
14+
### The wrong hypothesis
15+
16+
Duplicate key errors in postgres logs (`sentry_email_email_key`, `sentry_dashboard_organization_title_uniq`) led to investigating whether xdist workers shared the same Postgres database. Investigated `configure_split_db()` shallow copy, pytest-django's `_set_suffix_to_test_databases`, and `TEST.NAME` propagation. All turned out to be irrelevant — the duplicate key errors were from `initialize_app()` startup, not from test execution.
17+
18+
### The misleading grep
19+
20+
Searching for `order=` in the test file found lines 70 and 80 with `order=0` and `order=1`. These were assumed to be on `DashboardWidget` objects. They were actually on `DashboardWidgetQuery` objects (queries *within* a widget) — a different model created in the same `setUp` method with visually identical indentation.
21+
22+
### The actual root cause
23+
24+
`widget_1` and `widget_2` in `OrganizationDashboardDetailsTestCase.setUp()` were created without `order=`. `DashboardWidget.order` is `BoundedPositiveIntegerField(null=True)`, so both had `order=NULL`. `get_widgets()` uses `ORDER BY (order, id)`. PostgreSQL's ordering of NULL values is nondeterministic (heap scan order), so widget ordering varied depending on prior test activity within the worker session.
25+
26+
Under xdist with `--reuse-db` and `PYTHONHASHSEED=0`, the test distribution across shards is deterministic. The same 5 shards always got this test file with enough prior DB activity to scramble the NULL ordering. Without xdist, the test happened to pass due to a lucky heap layout.
27+
28+
### The fix
29+
30+
Added `order=0` to `widget_1` and `order=1` to `widget_2` in the parent `setUp`. Combined with the earlier fix of `order=2` and `order=3` on `widget_3` and `widget_4` in the child `setUp`, all widgets now have deterministic ordering.
31+
32+
### Lesson
33+
34+
Add diagnostic output early. The debug print (`for w in widgets: print(f"id={w.id} order={w.order}")`) immediately showed `Widget 1 order=None` and resolved the issue in one iteration.

docs/notes.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,6 @@ Verified empirically: `socket.setdefaulttimeout(5)` → listening socket `.getti
3333

3434
This is a subtle interaction: `pytest-rerunfailures` assumes sockets block indefinitely, but Sentry's global timeout silently changes that contract on accepted connections only.
3535

36-
## configure_split_db shallow copy bug
37-
38-
`configure_split_db()` used `.copy()` (shallow) to create `control` and `secondary` from `default`. The `TEST` dict inside each was the **same Python object**. When pytest-django's `_set_suffix_to_test_databases` iterated databases and set `TEST.NAME` for each, the last write overwrote all three. All workers ended up using the same Postgres database instead of per-worker isolation (`test_region_gw0`, `test_control_gw0`, etc.). Fixed with `copy.deepcopy()`.
39-
40-
This bug was latent on master (no xdist, no TEST.NAME collision) and only manifested under xdist where per-worker DB names are required.
41-
4236
## Why hash-based sharding beats algorithmic LPT
4337

4438
With 17+ shards and ~32K tests, the law of large numbers gives hash-based (`sha256(nodeid) % N`) sharding good-enough balance (~90-130s spread). LPT algorithms failed because:

docs/tiered-xdist-changes.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -93,13 +93,7 @@ Normal `--reruns` is unaffected — each worker retries failed tests locally via
9393

9494
The `calculate-shards` job now has a fast path: when selective testing isn't active (push to master/branch), it outputs static defaults (22 shards) without checkout, setup-sentry, or `pytest --collect-only`. Saves ~3 min on the critical path. When selective testing IS active (PR), the full collection pipeline still runs to compute the right shard count.
9595

96-
### 2g. Fix shared TEST dict in configure_split_db
97-
98-
**Modified:** `src/sentry/testutils/pytest/sentry.py` (`configure_split_db`)
99-
100-
Changed `.copy()` to `copy.deepcopy()` when creating `control` and `secondary` database configs. The shallow copy caused all three databases to share the same `TEST` dict object. When pytest-django's xdist suffix fixture set `TEST.NAME` for each database, the last write won — all three databases got the same test DB name. Workers shared a single Postgres database instead of getting isolated `test_region_gw0`, `test_control_gw0`, etc.
101-
102-
### 2h. Snowflake test fix
96+
### 2g. Snowflake test fix
10397

10498
**Modified:** `tests/sentry/utils/test_snowflake.py`
10599

src/sentry/testutils/pytest/sentry.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,12 @@ def configure_split_db() -> None:
4242
if already_configured or _use_monolith_dbs():
4343
return
4444

45-
import copy
46-
47-
settings.DATABASES["control"] = copy.deepcopy(settings.DATABASES["default"])
45+
settings.DATABASES["control"] = settings.DATABASES["default"].copy()
4846
settings.DATABASES["control"]["NAME"] = "control"
4947

5048
settings.DATABASES["default"]["NAME"] = "region"
5149

52-
settings.DATABASES["secondary"] = copy.deepcopy(settings.DATABASES["default"])
50+
settings.DATABASES["secondary"] = settings.DATABASES["default"].copy()
5351
settings.DATABASES["secondary"]["NAME"] = "secondary"
5452

5553
settings.DATABASE_ROUTERS = ("sentry.db.router.TestSiloMultiDatabaseRouter",)

0 commit comments

Comments
 (0)