Skip to content

Manage Porter threads and backpressure better#106

Merged
derekpierre merged 5 commits into
nucypher:developmentfrom
derekpierre:backpressure
Apr 20, 2026
Merged

Manage Porter threads and backpressure better#106
derekpierre merged 5 commits into
nucypher:developmentfrom
derekpierre:backpressure

Conversation

@derekpierre

@derekpierre derekpierre commented Feb 26, 2026

Copy link
Copy Markdown
Member

Type of PR:

  • Bugfix
  • Feature
  • Documentation
  • Other

Required reviews:

  • 1
  • 2
  • 3

What this does:

  • Depends on changes in Additional safety valves when using NetworkRequestClient for making threshold requests nucypher#3722

  • Use a global concurrent requests semaphore for absorbing some back-pressure from too many requests; if the semaphore can't be acquired after a timeout then return 429. Reduce Hendrix max threads since it can overwhelm the server.

    • Add in flight caps for server requests that require communication with multiple nodes in a cohort; we don't limit /get_ursulas and /bucket_sampling since those can be used for health checks.
  • Configure max worker threads used per request, and allow overwriting via env variable

Issues fixed/closed:

  • Fixes #...

Why it's needed:

Explain how this PR fits in the greater context of the NuCypher Network.
E.g., if this PR addresses a nucypher/productdev issue, let reviewers know!

Notes for reviewers:

What should reviewers focus on?
Is there a particular commit/function/section of your PR that requires more attention from reviewers?

@derekpierre derekpierre requested a review from Copilot February 26, 2026 18:27
@derekpierre derekpierre self-assigned this Feb 26, 2026
@derekpierre derekpierre changed the title [WIP] Manage Porter threads and back pressure better [WIP] Manage Porter threads and backpressure better Feb 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds request-level backpressure to Porter’s Flask/Hendrix web controller by limiting concurrent in-flight requests, while exempting health-check style endpoints.

Changes:

  • Introduces a global in-flight request semaphore with an acquire timeout and 429 on overload.
  • Exempts /get_ursulas and /bucket_sampling from the in-flight cap.
  • Lowers Hendrix deployer default max threadpool size.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
porter/main.py Defines the non-capped health-check paths and passes them into the WebController.
porter/controllers.py Implements the in-flight semaphore gating in Flask request hooks and adjusts deployer defaults.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread porter/controllers.py Outdated
Comment thread porter/controllers.py Outdated
Comment thread porter/controllers.py
Comment thread porter/controllers.py
@codecov-commenter

codecov-commenter commented Feb 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.19%. Comparing base (bc1aee7) to head (6b9ee68).

Additional details and impacted files
@@               Coverage Diff               @@
##           development     #106      +/-   ##
===============================================
+ Coverage        90.91%   91.19%   +0.27%     
===============================================
  Files               18       18              
  Lines              969      999      +30     
===============================================
+ Hits               881      911      +30     
  Misses              88       88              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@derekpierre derekpierre marked this pull request as ready for review April 10, 2026 15:40
@derekpierre derekpierre changed the title [WIP] Manage Porter threads and backpressure better Manage Porter threads and backpressure better Apr 10, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread porter/controllers.py
Comment on lines +184 to +191
self.log.warn("Too many in-flight requests.")
response_data = {
"error": "too_many_requests",
"message": "Too many in-flight requests.",
}
response = make_response(
jsonify(response_data), 429, {"Retry-After": "3"}
)

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry-After is hardcoded to "3" even though the backpressure wait time is configurable via PORTER_DEPLOYER_IN_FLIGHT_ACQUIRE_TIMEOUT_S. If operators change the timeout, clients will receive an inaccurate retry hint. Consider deriving Retry-After from _DEPLOYER_IN_FLIGHT_ACQUIRE_TIMEOUT_S (e.g., ceil to seconds) or making it separately configurable, and keep the header and timeout consistent.

Copilot uses AI. Check for mistakes.

@derekpierre derekpierre Apr 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inexact science. There is no guarantee about what the best value is because it depends on load. It's not necessarily related to the acquire timeout since the requester already waited that timeout and was unsuccessful, how much more time before trying again is uncertain.

I think 3s is a good balance between not too short and not too long, so I'll keep the constant for now for simplicity.

Perhaps reviewers have thoughts here?

Comment thread tests/test_in_flight_request_limiting.py
Comment thread dev-requirements.txt
…h multiple nodes in a cohort; we don't limit /get_ursulas and /bucket_sampling since those can be used for health checks.

Use a global concurrent requests semaphore for absorbing some back-pressure from too many requests; if the semaphore can't be acquired after a timeout then return 429.
Reduce Hendrix max threads since it can overwhelm the server.
@derekpierre derekpierre merged commit 500167b into nucypher:development Apr 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants