Skip to content

Flaky E2E: test_change_password_success_invalidates_other_sessions intermittently 401s client A #109

@fabriziosalmi

Description

@fabriziosalmi

Summary

tests/test_e2e_live.py::TestChangePassword::test_change_password_success_invalidates_other_sessions fails intermittently (~30% of runs). The assertion that trips is at line 1204:

me_a = a.get("/api/v1/auth/me")
assert me_a.status_code == 200, f"client A should keep working; got {me_a.status_code}"

Client A — the session that performed the password change and whose cookies were just rotated — occasionally gets 401 on its next protected call, when it should keep working. Client B (the other session) is correctly invalidated; that half of the test is stable.

Evidence (same commit, different outcomes)

Run Result
main 2786564 (CI -- Full Stack) FAIL
PR #100 (next 16) rebased FAIL
PR #99 (aiodns) rebased PASS
PR #96 (python) PASS

The bump PRs are unrelated to auth — the test fails/passes regardless of the PR content, which is the signature of a flaky test, not a regression. It is not blocking merges (main is not branch-protected; E2E is not a required check), but it makes CI noisy and erodes trust in the suite.

Static analysis — what it is not

The obvious suspect is the same-second iat / password_changed_at watermark race, but that path is sound:

  • change_password sets password_changed_at = next_second = floor(now) + 1 (auth.py:965-968) and mints the rotated tokens with iat_override=next_second (auth.py:1002).
  • get_current_user compares in epoch seconds: if iat_seconds < pwc_seconds: 401 (dependencies.py:126-130).
  • For client A's rotated token, iat == pwc == floor(now)+1, so iat < pwc is False → it passes. By construction the watermark cannot 401 client A's rotated token.

So the 401 means client A's me_a request is being sent with the stale (pre-rotation) access-token cookie, whose iat is the original login second < pwc → correctly bounced.

Likely root cause

Cookie rotation on the 204 No Content response is occasionally not picked up by the client before me_a fires:

  • change_password returns 204 and relies on the Set-Cookie headers attached to the injected Response (_build_token_response_set_auth_cookies). The endpoint docstring already warns that Set-Cookie on 204 is fragile across Starlette versions ("Constructing a fresh Response(...) with headers=response.headers drops them on the floor — never do that here").
  • If the new access_token cookie isn't applied to the httpx.Client jar in time, me_a reuses the old token and gets a (correct) 401.

This is consistent with the intermittency: it depends on Set-Cookie parsing/timing on the 204, not on wall-clock second boundaries.

Suggested investigation / fix directions

  1. Confirm the cookie hypothesis: in the test, after the change-password 204, assert that a.cookies.get("access_token") actually changed before calling me_a. If it didn't, the bug is the 204 Set-Cookie handling (server or client side).
  2. Server-side option: return 200 with an explicit body from change_password instead of 204, so Set-Cookie is carried on a response shape that every Starlette/httpx version handles uniformly. (Behaviour change — weigh against the current 204 contract.)
  3. Test-side option: if the production 204 behaviour is correct for real browsers and only the test client is racing, make the test deterministic — re-read the rotated cookie explicitly, or add a single retry/refresh step that mirrors what the real SPA api-client does on a 401.
  4. Either way, lock the outcome with a deterministic assertion so this can't silently flake again.

Scope

Separate from the dependabot PRs (#99, #100) and from the v2.5.x release work — surfaced during PR triage. Filing so it isn't lost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions