Prevent tracking when rollback pending #435
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request adds a guard to the ACL + metadata trackers to avoid any processing when a rollback is pending. This is to prevent both superflous work as well as potential skipping of transactions / change sets. While preventing superflous work should be obvious, the second benefit requires some rather detailed explanation.
How SOLR re-indexation can skip transactions / change sets
In a project I was asked to assist in analysing an indexing issue, it was found that temporary errors (network connectivity / DB issues) caused SOLR re-indexation to skip over a significant period of time and transition directly into live-indexing mode of current changes. The following is a short description of what we came up with in our analysis:
/transactions
or/nextTransaction
web script to failAbstractTracker.track()
method which callssetRollback(true, t);
to schedule a rollback when theCommitTracker
next runs@DisallowConcurrentExecution
on theTrackerJob
, but once the tracker run completes after the error, Quartz immediately triggers the job again to compensate for all previous blockagesAbstractTracker
callsSolrInformationServer.continueState()
, which sets thelastGoodTxCommitTimeInIndex
to the larger oflastIndexedTxCommitTime
orlastStartTime
(minus the "hole retention" period to account for yet-to-be-committed transactions)lastGoodTxCommitTimeInIndex
as thefromCommitTime
when fetching transactions to be indexedIn a re-index scenario, this can cause a sudden jump in the transactions being indexed to index any current transactions. If even one current transaction is indexed, even though a rollback is performed later by the
CommitTracker
, the newer transaction(s) indexed as a result of this jump do(es) not appear to be removed from the index. After the rollback, SOLR continues to be in live-tracking mode, and potentially years worth of transactions and data end up being skipped. The only mitigation in this state seems to be to clear the entire index and start re-indexation from scratch.The following archive contains a Docker-based setup and instructions on reproducing the described issue:
SOLR Txn Skip Reproducer.zip
Note: Transaction skipping can only happen if at least one transaction with an indexable change/deletion exists with a timestamp that is at most 60 minutes before the start of the initial re-index metadata tracker job run (within the time window the calculated
lastGoodTxCommitTimeInIndex
based on thelastStartTime
andholeRetention
allows).This PR prevents this issue with skipping transactions by stepping in at step 3 and preventing the re-triggered tracker job from performing any index changes before the
CommitTracker
was able to act on the scheduled rollback.