Fix more test flakiness #467

fajpunk · 2025-11-21T20:22:23Z

The app metrics resiliency tests intentionally break aiokafka clients
in an unclean way. When the aiokafka objects get garbage collected,
Python asyncio logs warning messages that task exceptions were never
retrieved, like this:

asyncio:base_events.py:1879 Future exception was never retrieved
future: <Future finished exception=NodeNotReadyError('Attempt to send a request to node which is not ready (node id 1).')>
aiokafka.errors.NodeNotReadyError: NodeNotReadyError: Attempt to send a request to node which is not ready (node id 1).

The time that this logging happens is not deterministic. It can mess up
other tests, like tests of the Sentry functionality because the log
messages get picked up by the mock Sentry transport if they happen after
the Sentry fixtures have initialized:

https://github.com/lsst-sqre/safir/actions/runs/19583537252/job/56087067160

There may be a way to deterministically and cleanly cancel these failed
tasks, or it may be a bug in the aiokafka library, but until we figure
it out, run these last to make sure they don't infect other tests.

rra

Looks good as a workaround. I guess ideally we'd explicitly close the clients and catch the exception, but I don't know how to do that.

The app metrics resiliency tests intentionally break `aiokafka` clients in an unclean way. When the aiokafka objects get garbage collected, Python asyncio logs warning messages that task exceptions were never retrieved, like this: ``` asyncio:base_events.py:1879 Future exception was never retrieved future: <Future finished exception=NodeNotReadyError('Attempt to send a request to node which is not ready (node id 1).')> aiokafka.errors.NodeNotReadyError: NodeNotReadyError: Attempt to send a request to node which is not ready (node id 1).``` The time that this logging happens is not deterministic. It can mess up other tests, like tests of the Sentry functionality because the log messages get picked up by the mock Sentry transport if they happen after the Sentry fixtures have initialized: https://github.com/lsst-sqre/safir/actions/runs/19583537252/job/56087067160 There may be a way to deterministically and cleanly cancel these failed tasks, or it may be a bug in the aiokafka library, but until we figure it out, run these last to make sure they don't infect other tests.

fajpunk force-pushed the tickets/DM-53251/more-flakiness branch 9 times, most recently from 68ee6d3 to 0bb258e Compare November 21, 2025 22:58

fajpunk requested review from athornton and rra November 21, 2025 22:59

rra approved these changes Dec 1, 2025

View reviewed changes

fajpunk force-pushed the tickets/DM-53251/more-flakiness branch from 0bb258e to b899551 Compare December 1, 2025 22:51

fajpunk enabled auto-merge December 1, 2025 22:51

fajpunk merged commit 19707c3 into main Dec 1, 2025
6 checks passed

fajpunk deleted the tickets/DM-53251/more-flakiness branch December 1, 2025 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix more test flakiness #467

Fix more test flakiness #467

Uh oh!

fajpunk commented Nov 21, 2025 •

edited

Loading

Uh oh!

rra left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix more test flakiness #467

Fix more test flakiness #467

Uh oh!

Conversation

fajpunk commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fajpunk commented Nov 21, 2025 •

edited

Loading