Skip to content

Conversation

@ninsmiracle
Copy link
Contributor

What problem does this PR solve?

#2211

What is changed and how does it work?

add lock when remove_all_duplication.
We can see src/replica/duplication/replica_duplicator_manager.h, there are code comments like this

    // avoid thread conflict between replica::on_checkpoint_timer and
    // duplication_sync_timer.
    mutable zlock _lock;

So we need to perfect this lock's usage.

Checklist

Tests
  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
How this PR work?

Possible conflict locations with on_checkpoint_timer:

void replica::on_checkpoint_timer()
{
    if (_private_log) {
    mutation_log_ptr plog = _private_log;

    decree last_durable_decree = _app->last_durable_decree();
    // RACE PROBLEM!!!
    decree min_confirmed_decree = _duplication_mgr->min_confirmed_decree();
    decree cleanable_decree = last_durable_decree;
    int64_t valid_start_offset = _app->init_info().init_offset_in_private_log;
    /// ...

    }
/// ...
}


decree replica_duplicator_manager::min_confirmed_decree() const
{
    zauto_lock l(_lock);

    decree min_decree = invalid_decree;
    if (_replica->status() == partition_status::PS_PRIMARY) {
        // HERE
        for (auto &kv : _duplications) {
            const duplication_progress &p = kv.second->progress();
            if (min_decree == invalid_decree) {
                min_decree = p.confirmed_decree;
            } else {
                min_decree = std::min(min_decree, p.confirmed_decree);
            }
        }
    } else if (_primary_confirmed_decree > 0) {
        // if the replica is not primary, use the latest known (from primary)
        // confirmed_decree instead.
        min_decree = _primary_confirmed_decree;
    }
    return min_decree;
}

@github-actions github-actions bot added the cpp label Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant