Skip to content

Make GunicornWebWorker exit with APP_LOAD_ERROR in case of a startup error #6968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

HallBregg
Copy link

Make GunicornWebWorker exit with WORKER_BOOT_ERROR (3) in case of an boot error.

I've found out that GunicornWebWorker exits with 0 error code even when an error occurs during startup. This makes Gunicorn endlessly spawn new workers.

With this pull request I would like to show the problem and ask if there is any other solution. I am aware of the comment # ignore all finalization problems and would like to ask about the context of this comment.

What do these changes do?

These changes set GunicornWebWorker.exit_code to the value expected by the Gunicorn, which is 3 and set GunicornWebWorker.booted to false.

class Arbiter(object):
    """
    Arbiter maintain the workers processes alive. It launches or
    kills them if needed. It also manages application reloading
    via SIGHUP/USR2.
    """

    # A flag indicating if a worker failed to
    # to boot. If a worker process exist with
    # this error code, the arbiter will terminate.
    WORKER_BOOT_ERROR = 3

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@HallBregg HallBregg requested a review from asvetlov as a code owner September 25, 2022 11:05
@@ -55,6 +56,9 @@ def run(self) -> None:
self.loop.run_until_complete(self._task)
except Exception:
self.log.exception("Exception in gunicorn worker")
self.booted = False
self.exit_code = Arbiter.WORKER_BOOT_ERROR
Copy link
Member

@Dreamsorcerer Dreamsorcerer Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the gunicorn code seems to set self.booted = True before it calls .run(). This suggests to me that it is about recording whether the process was created, rather than whether the application startup succeeded. This seems to me like it should probably exit 1 instead.

If this was how gunicorn was supposed to work, then I'd expect booted to still be False and then we would set it to True in _run() after the setup is complete (line 95).

Looking at some of the other workers, I'm also noticing gtornado using alive and server_alive, not sure if something like that is more appropriate:
https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/gtornado.py#L91-L92

Also wondering if we should actually be using the AsyncWorker as our base class:
https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/base_async.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact there is an APP_LOAD_ERROR, which sounds a lot more accurate for this situation:
https://github.com/benoitc/gunicorn/blob/master/gunicorn/arbiter.py#L34

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about using APP_LOAD_ERROR before, but decided to stay with WORKER_BOOT_ERROR (I didn't know when we can say the app is loaded, but the same problem I have got with a worker). Nevertheless you are probably right about setting booted to Falseafter it has already been set to True, so APP_LOAD_ERROR seems to be correct.

I will take a look at the other workers to have a bit more knowledge.

Copy link
Member

@Dreamsorcerer Dreamsorcerer Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't know when we can say the app is loaded, but the same problem I have got with a worker).

When the setup has completed on line 95, the application has been initialised. If we go this way, then I'd suggest adding a self.started = False and setting it to True after site.start() on line 105. Then only set this exit status if not self.started, otherwise a 1 is probably a reasonable exit code (likely indicating that an exception happened in the app's cleanup).

Copy link
Member

@Dreamsorcerer Dreamsorcerer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping one of the gunicorn maintainers might provide some feedback, but if we don't hear anything back in a few days, then this looks reasonable to me.

@codecov
Copy link

codecov bot commented Sep 26, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.73%. Comparing base (50d23ae) to head (595da9e).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6968   +/-   ##
=======================================
  Coverage   98.73%   98.73%           
=======================================
  Files         121      121           
  Lines       36727    36731    +4     
  Branches     4384     4384           
=======================================
+ Hits        36261    36265    +4     
  Misses        314      314           
  Partials      152      152           
Flag Coverage Δ
CI-GHA 98.61% <100.00%> (+<0.01%) ⬆️
OS-Linux 98.30% <100.00%> (+<0.01%) ⬆️
OS-Windows 96.12% <25.00%> (-0.02%) ⬇️
OS-macOS 97.40% <100.00%> (+<0.01%) ⬆️
Py-3.10.11 97.25% <100.00%> (+<0.01%) ⬆️
Py-3.10.15 97.84% <100.00%> (-0.01%) ⬇️
Py-3.11.10 97.84% <100.00%> (-0.05%) ⬇️
Py-3.11.9 97.30% <100.00%> (+<0.01%) ⬆️
Py-3.12.7 98.37% <100.00%> (+<0.01%) ⬆️
Py-3.13.0 98.31% <100.00%> (-0.05%) ⬇️
Py-3.9.13 97.17% <100.00%> (+<0.01%) ⬆️
Py-3.9.20 97.75% <100.00%> (-0.01%) ⬇️
Py-pypy7.3.16 97.32% <100.00%> (-0.01%) ⬇️
VM-macos 97.40% <100.00%> (+<0.01%) ⬆️
VM-ubuntu 98.30% <100.00%> (+<0.01%) ⬆️
VM-windows 96.12% <25.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@HallBregg HallBregg requested a review from webknjaz as a code owner September 27, 2022 18:26
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label May 7, 2023
@Dreamsorcerer Dreamsorcerer changed the title Make GunicornWebWorker exit with WORKER_BOOT_ERROR (3) in case of an boot error. Make GunicornWebWorker exit with APP_LOAD_ERROR in case of a startup error May 7, 2023
@Dreamsorcerer
Copy link
Member

OK, they have provided feedback now. Sounds like this is not the desired approach:
benoitc/gunicorn#2867 (comment)

@asvetlov asvetlov added backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot backport-3.12 Trigger automatic backporting to the 3.12 release branch by Patchback robot and removed backport-3.10 labels Nov 21, 2024
Copy link

codspeed-hq bot commented Nov 21, 2024

CodSpeed Performance Report

Merging #6968 will not alter performance

Comparing HallBregg:worker-boot-error (595da9e) with master (50d23ae)

Summary

✅ 42 untouched benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot backport-3.12 Trigger automatic backporting to the 3.12 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants