[Feature Request] Expose SDK metric for `worker._count_not_evict_count`

### Is your feature request related to a problem? Please describe.

We have found that periodically (for reasons that we still need to root cause) our workers run into a series of `Failed running eviction job for run ID 0196d798-a08b-7a00-9082-353865f449b4, continually retrying eviction. Since eviction could not be processed, this worker may not complete and the slot may remain forever used unless it eventually completes.` errors.  Then hours later when the pod containing the worker is terminated, we see this log: `Shutting down workflow worker, but 46 workflow(s) could not be evicted previously, so the shutdown may hang`.  For this particular worker, we run 50 concurrent workflows, which if I interpret things correct means that for several hours the worker was in an infinite loop trying to allow 46 workflows to evict and only able to process 4 workflow tasks at a time.

We would like to be able to detect and alert on these situations more proactively.  Usually we end up finding out about them because the worker set scales up to the maximum number of replicas for an extended period of time.

### Describe the solution you'd like

Since the code already keeps track of when it is in its own infinite loop trying to process the eviction, I think it would be useful to expose that information as a metric so that alerting tools can be used to alert when pods have been in that state for whatever the team monitoring the metric determines to be "too long".

### Additional context

If the team is bold enough, it could also be nice to do one or more of the following:
1. Provide a setting that forces the worker to shutdown if it has been in an eviction loop for too long.
2. Provide more threads than `max_concurrent_workflow_tasks` so that the ability to process workflows isn't as likely to be impeded by the infinite eviction loop.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Expose SDK metric for `worker._count_not_evict_count` #875

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Expose SDK metric for worker._count_not_evict_count #875

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature Request] Expose SDK metric for `worker._count_not_evict_count` #875