Skip to content

Conversation

@fajpunk
Copy link
Member

@fajpunk fajpunk commented Nov 21, 2025

The app metrics resiliency tests intentionally break aiokafka clients
in an unclean way. When the aiokafka objects get garbage collected,
Python asyncio logs warning messages that task exceptions were never
retrieved, like this:

asyncio:base_events.py:1879 Future exception was never retrieved
future: <Future finished exception=NodeNotReadyError('Attempt to send a request to node which is not ready (node id 1).')>
aiokafka.errors.NodeNotReadyError: NodeNotReadyError: Attempt to send a request to node which is not ready (node id 1).

The time that this logging happens is not deterministic. It can mess up
other tests, like tests of the Sentry functionality because the log
messages get picked up by the mock Sentry transport if they happen after
the Sentry fixtures have initialized:

https://github.com/lsst-sqre/safir/actions/runs/19583537252/job/56087067160

There may be a way to deterministically and cleanly cancel these failed
tasks, or it may be a bug in the aiokafka library, but until we figure
it out, run these last to make sure they don't infect other tests.

@fajpunk fajpunk force-pushed the tickets/DM-53251/more-flakiness branch 9 times, most recently from 68ee6d3 to 0bb258e Compare November 21, 2025 22:58
@fajpunk fajpunk requested review from athornton and rra November 21, 2025 22:59
Copy link
Member

@rra rra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a workaround. I guess ideally we'd explicitly close the clients and catch the exception, but I don't know how to do that.

The app metrics resiliency tests intentionally break `aiokafka` clients
in an unclean way. When the aiokafka objects get garbage collected,
Python asyncio logs warning messages that task exceptions were never
retrieved, like this:

```
asyncio:base_events.py:1879 Future exception was never retrieved
future: <Future finished exception=NodeNotReadyError('Attempt to send a request to node which is not ready (node id 1).')>
aiokafka.errors.NodeNotReadyError: NodeNotReadyError: Attempt to send a request to node which is not ready (node id 1).```

The time that this logging happens is not deterministic. It can mess up
other tests, like tests of the Sentry functionality because the log
messages get picked up by the mock Sentry transport if they happen after
the Sentry fixtures have initialized:

https://github.com/lsst-sqre/safir/actions/runs/19583537252/job/56087067160

There may be a way to deterministically and cleanly cancel these failed
tasks, or it may be a bug in the aiokafka library, but until we figure
it out, run these last to make sure they don't infect other tests.
@fajpunk fajpunk force-pushed the tickets/DM-53251/more-flakiness branch from 0bb258e to b899551 Compare December 1, 2025 22:51
@fajpunk fajpunk enabled auto-merge December 1, 2025 22:51
@fajpunk fajpunk merged commit 19707c3 into main Dec 1, 2025
6 checks passed
@fajpunk fajpunk deleted the tickets/DM-53251/more-flakiness branch December 1, 2025 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants