How to increase retention_lease period

First of all, thank you for this project. It's a nice piece of technology.

I note that in the [docs](https://docs.opensearch.org/docs/latest/tuning-your-cluster/replication-plugin/api/) for CCR, 
> You can’t resume replication after it’s been paused for more than 12 hours. You must [stop replication](https://docs.opensearch.org/docs/latest/replication-plugin/api/#stop-replication), delete the follower index, and restart replication of the leader.

FWIU, this limit is related to the `index.soft_deletes.retention_lease.period` to ensure that details of deleted docs are retained so that the follower cluster can replay these from the translog ([ref](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/index-modules-history-retention.html)). However, I also note that there exist the `retention_lease_max_failure_duration`, with `1h` by default but it's max value is `12h`. I'm wondering which of these two settings is responsible for the reason why replication can't be paused for more than 12 hours and whether this limit can be increased. My teams indices are 100Tb so restarting CCR from scratch is really not ideal.

Separately, we have also encountered cases when we reconfigured and restarted nodes and replication failed without explanation and cannot be resumed replication even though from the time the replication failure occurred to the time we tried to resume replication was within the `12h` window
```bash
_plugins/_replication/<index>/_status?pretty
{
  "status" : "FAILED",
  "reason" : "",
  "leader_alias" : "repl_conn",
  "leader_index" : "<index>",
  "follower_index" : "<index>"
}
````
We suspect that maybe it was because some `_state` directories were lost but we're not sure whether CCR relies on this. Could I get any indication of what this issue might be related to? Some of the logs we found mentioned
```
[2025-05-13T20:09:24,763][WARN ][o.o.p.PersistentTasksClusterService] [master-us2-1] trying to update state on task replication:[<index>][540] with unexpected allocation id 11421
[2025-05-13T20:09:25,386][WARN ][o.o.p.PersistentTasksClusterService] [master-us2-1] trying to update state on task replication:[<index>][913] with unexpected allocation id 11426
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to increase retention_lease period #1543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to increase retention_lease period #1543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions