Skip to content

Improve DURABILITY logic for circuit breaker, make it actionable  #58404

Open
@Bukhtawar

Description

@Bukhtawar

Problem Description

While real memory circuit breaker helps with parent memory accounting based on real heap usage the DURABILITY is still derived from child circuit breaker, the logic for which is based on whichever contributed the maximum based on their respective durability. Now child circuit breakers are known to be not so accurate, as a result, it gets harder to derive the nature of issue resulting in frequent GC or request trips.

Keeping a node around which has request tripping frequently isn't optimal and decays into the throughput, in some cases it may be deemed better to bounce off such nodes to clear garbage.

Sharing a sample instance showing how real memory(parent) and child could be totally off

curl localhost:9200/_cat/thread_pool?v

{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [8337839048/7.7gb], which is larger than the limit of [8127315968/7.5gb], real usage: [8337839048/7.7gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, accounting=6388018/6mb]","bytes_wanted":8337839048,"bytes_limit":8127315968,"durability":"PERMANENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [8337839048/7.7gb], which is larger than the limit of [8127315968/7.5gb], real usage: [8337839048/7.7gb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, accounting=6388018/6mb]","bytes_wanted":8337839048,"bytes_limit":8127315968 .....


Proposal

Derive durability based on some function of

  1. Request trips over a period of time(avoiding flip flops)
  2. Heap after GC(ensuring GC throughput is still reasonable)

Should nodes still be considered healthy(as a part of the cluster) if nodes continue to trip majority of the requests over a prolonged period of time or even with the present PERMANENT durability nature of circuit breaker. Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions