Extension - Monitor CPU and bail if approaching 80% utilization to avoid throttling

## Motivations
- Lambda has a model of sending no more than 1 request to an exec env every 100 ms
- When allocating, say, 1/8th of a CPU, you are really saying you want 12.5 ms of CPU in a 100 ms time slice - if you go over this you get throttled (just like with containers)
  - Note: it appears, through observation, that the throttling interval is actually 20 ms
- ~~The dispatch model of only 1 request per 100 ms prevents problems with the 1st request causing a 2nd request on the same exec env to be throttled~~
  - This theory did not hold up under observation
  - Using more than the allotted CPU time in the first 20 ms results in a pause of execution until the end of the 20 ms
  - Using slightly less than the allotted CPU time in the first 20 ms results in no pause of execution
- With `lambda-dispatch` this model is broken because we allow concurrent requests, which makes throttling more likely, and the consequences are devastating to throughput and predictable response times

## Solution Pattern - Ideas from December, 2024
- This problem can be mitigated by monitoring CPU usage vs CPU request/limit and telling the router we need to go away when we approach a threshold (e.g. 70% or 80%) of usage of the CPU amount per ~~100~~ 20 ms timeslice starting with 0 ms as the invocation time
- ~~When the lambda exec env returns, it won't be invoked until the 100 ms timeslice completes, so the requests will get routed to an exec env that is not likely to be throttled~~
   - Because the theory on the first 100 ms did not hold, this idea will not work
- Caveat: this cannot prevent all throttling, but it can avoid repeatedly encountering it in a single instance and it avoids a throttled instance being sent many additional requests that would exacerbate the problem

## Solution Pattern - Ideas from March, 2025
- Extension
  - Monitor CPU usage per 20 ms
  - If the extension is running close (within 30%?) of the limit
     - Send the router a message to decrease max parallel requests by 1 each time this happens
- Router
   - Allow each lambda to have a variable number of max requests outstanding
   - The lambdas are all supposed to be the same - so if one lambda decreases it's count then we should decrease it for all?
   - Would there be any need to allow the concurrent count to grow back?
   - Lambda config could be checked on startup - if Lambda has < 512 MB of RAM, concurrent requests should be limited to no more than 2 and so on

## Acceptance Criteria
- See: https://github.com/pwrdrvr/lambda-throttling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extension - Monitor CPU and bail if approaching 80% utilization to avoid throttling #246

Motivations

Solution Pattern - Ideas from December, 2024

Solution Pattern - Ideas from March, 2025

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extension - Monitor CPU and bail if approaching 80% utilization to avoid throttling #246

Description

Motivations

Solution Pattern - Ideas from December, 2024

Solution Pattern - Ideas from March, 2025

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions