Skip to content

Add null check for connection to prevent AttributeError on drain_events#457

Merged
auvipy merged 5 commits into
celery:mainfrom
Cyanty:null-pointer-handling
Mar 9, 2026
Merged

Add null check for connection to prevent AttributeError on drain_events#457
auvipy merged 5 commits into
celery:mainfrom
Cyanty:null-pointer-handling

Conversation

@Cyanty
Copy link
Copy Markdown
Contributor

@Cyanty Cyanty commented Mar 6, 2026

While null checks exist elsewhere, this specific call was overlooked. This aligns with existing error handling patterns in the codebase.

@auvipy
Copy link
Copy Markdown
Member

auvipy commented Mar 7, 2026

@ChickenBenny please review this

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a shutdown-time crash where AbstractChannel.wait() can attempt to call drain_events() on a None connection, leading to an AttributeError during worker shutdown / channel revive logic.

Changes:

  • Add a local conn = self.connection reference inside wait()’s loop.
  • Raise RecoverableConnectionError when self.connection is None instead of calling drain_events().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread amqp/abstract_channel.py
Comment thread amqp/abstract_channel.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add proper unit tests and possibly Integration tests for this proposed change?

@ChickenBenny
Copy link
Copy Markdown
Contributor

ChickenBenny commented Mar 8, 2026

Hi @Cyanty, thanks for looking into this and providing a quick fix!

The local variable assignment conn = self.connection before the null check is a solid defensive practice here, and throwing a RecoverableConnectionError will definitely stop the ungraceful crash during shutdown.

However, I have a couple of thoughts on this approach:

1. Is this treating the symptom rather than the root cause?

Looking at the traceback in #456, here is a step-by-step breakdown:

  1. Initiate Shutdown (Celery): The Celery worker triggers shutdown(warm=True) to exit gracefully.
  2. Close Connection (Kombu): Kombu executes connection.close() to terminate the connection to RabbitMQ.
  3. Close Channel (py-amqp): channel.close() sets self.is_closing = True, then sends a Channel.Close frame and waits for Channel.CloseOk.
  4. Misinterpretation (py-amqp): While waiting, the _on_close callback is triggered. The guard inside _on_close only checks self.connection.is_closing (connection-level flag), but not self.is_closing (channel-level flag). Since self.is_closing is already True, this is an intentional shutdown being misidentified as an unexpected close.
  5. Attempted Revival (py-amqp): Because the channel-level closing state is not checked, _do_revive() is called, which invokes self.open() and enters a wait() loop.
  6. Missing Connection & Crash: Inside the wait() loop, self.connection.drain_events() is called. However, the close() method's finally block has already set self.connection = None, causing the crash.
  7. Exception Thrown: None.drain_events() results in AttributeError: 'NoneType' object has no attribute 'drain_events'.

The root cause appears to be in _on_close — it already has self.is_closing available on the channel instance (set in close()), but never consults it. A more targeted fix might be:

 def _on_close(self, reply_code, reply_text, class_id, method_id):
     self.send_method(spec.Channel.CloseOk)
-    if not self.connection.is_closing:
+    if not self.connection.is_closing and not self.is_closing:
         self._do_revive()
         raise error_for_code(
             reply_code, reply_text, (class_id, method_id), ChannelError,
         )

That said, the null check in this PR is still a worthwhile defensive guard regardless of whether the root cause is also addressed — a belt-and-suspenders approach. Would it be worth opening a separate issue or follow-up PR for the _on_close guard?

2. Missing test coverage

We absolutely need a unit test for this new exception path. Please add a test case in t/unit/test_abstract_channel.py simulating wait() when self.connection is None to ensure it raises RecoverableConnectionError.

@ChickenBenny
Copy link
Copy Markdown
Contributor

I have submitted a new PR #458 to fix the root cause of this issue. It adds the not self.is_closing guard in _on_close() to prevent the channel from attempting a revival during a graceful shutdown.

I think this PR and #457 complement each other nicely—one fixes the root cause, and the other acts as a defensive fallback.

@auvipy
Copy link
Copy Markdown
Member

auvipy commented Mar 8, 2026

I think we can accept both?

@Cyanty
Copy link
Copy Markdown
Contributor Author

Cyanty commented Mar 8, 2026

@auvipy @ChickenBenny

Thank you for the prompt response from the community. ChickenBenny's suggestion is even better and makes the code more robust. I really appreciate your reply and explanation.
I believe this issue is now resolved. These two PRs prevent the 'NoneType' object has no attribute 'drain_events' error from occurring.
In addition, I have just added and passed the related unit test.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ChickenBenny
Copy link
Copy Markdown
Contributor

LGTM~

@auvipy auvipy merged commit 261860c into celery:main Mar 9, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants