Skip to content

e3/w3/a2 invalid on region aware placement if min regions for durability 2 #4553

Open
@benjumanji

Description

@benjumanji

I have the following config (shortened for brevity) on pulsar 4.0.1

bookkeeperClientRegionawarePolicyEnabled=true
reppRegionsToWrite=euw1-az3;euw1-az1;euw1-az2
reppMinimumRegionsForDurability=2

I have at least three bookies. If I try the aforementioned policy (e3,w3,a2) then the exception here:

+ "violates the requirement to satisfy durability constraints when running in degraded mode");
is thrown.

Screenshot 2025-01-30 at 21 01 17

This makes little sense to me as 2 <= 3 - 3/2 evaluates to true, but I am failing to see why this is a bad configuration.

            // We must survive the failure of numRegions - effectiveMinRegionsForDurability. When these
            // regions have failed we would spread the replicas over the remaining
            // effectiveMinRegionsForDurability regions; we have to make sure that the ack quorum is large
            // enough such that there is a configuration for spreading the replicas across
            // effectiveMinRegionsForDurability - 1 regions

Ok so I have 3 regions, and I want 2 for durability. I therefore can only tolerate 1 region failing. If that region fails I have two regions, and I require two acks. I have two bookies, they can both ack, what's the problem? Why is 4/4/3 good and 3/3/2 bad? If the argument is that the initial placements might be 2 in one region and 1 in another, why doesn't this apply to 4/4/3 (3 in one region and one in another)? If we plug in 3/3/2 to the comment, then we need to survive 3 - 2 failures (1), and we need to make sure acks cover 2 - 1 (1) regions? Why does 3 acks + 4 writers fulfil this and 2 acks and 3 writers not?

I guess what's eating me is I don't want the extra tail latency or to pay for the extra disks. I just want 3 replicas, and to survive a region out. There doesn't seem to be a configuration possible for this. The only value for min regions for durability under which the expression evaluates to false for 3/3/2 is 1, which is a data-loss ready config.

Originally posted by @benjumanji in apache/pulsar#23913

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions