Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent tracking when rollback pending #435

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AFaust
Copy link

@AFaust AFaust commented Mar 7, 2025

This pull request adds a guard to the ACL + metadata trackers to avoid any processing when a rollback is pending. This is to prevent both superflous work as well as potential skipping of transactions / change sets. While preventing superflous work should be obvious, the second benefit requires some rather detailed explanation.

How SOLR re-indexation can skip transactions / change sets

In a project I was asked to assist in analysing an indexing issue, it was found that temporary errors (network connectivity / DB issues) caused SOLR re-indexation to skip over a significant period of time and transition directly into live-indexing mode of current changes. The following is a short description of what we came up with in our analysis:

  • one invocation of the metadata tracker runs to continuously poll transactions + nodes to index from ACS, until a temporary error causes a call to the /transactions or /nextTransaction web script to fail
  • the exception is propagated up to the AbstractTracker.track() method which calls setRollback(true, t); to schedule a rollback when the CommitTracker next runs
  • while the metadata tracker was running, subsequent triggers for the job were blocked by Quartz due to the annotation @DisallowConcurrentExecution on the TrackerJob, but once the tracker run completes after the error, Quartz immediately triggers the job again to compensate for all previous blockages
  • since the scheduled rollback was not processed yet, the AbstractTracker calls SolrInformationServer.continueState(), which sets the lastGoodTxCommitTimeInIndex to the larger of lastIndexedTxCommitTime or lastStartTime (minus the "hole retention" period to account for yet-to-be-committed transactions)
  • the metadata tracker uses this lastGoodTxCommitTimeInIndex as the fromCommitTime when fetching transactions to be indexed

In a re-index scenario, this can cause a sudden jump in the transactions being indexed to index any current transactions. If even one current transaction is indexed, even though a rollback is performed later by the CommitTracker, the newer transaction(s) indexed as a result of this jump do(es) not appear to be removed from the index. After the rollback, SOLR continues to be in live-tracking mode, and potentially years worth of transactions and data end up being skipped. The only mitigation in this state seems to be to clear the entire index and start re-indexation from scratch.

The following archive contains a Docker-based setup and instructions on reproducing the described issue:
SOLR Txn Skip Reproducer.zip
Note: Transaction skipping can only happen if at least one transaction with an indexable change/deletion exists with a timestamp that is at most 60 minutes before the start of the initial re-index metadata tracker job run (within the time window the calculated lastGoodTxCommitTimeInIndex based on the lastStartTime and holeRetention allows).

This PR prevents this issue with skipping transactions by stepping in at step 3 and preventing the re-triggered tracker job from performing any index changes before the CommitTracker was able to act on the scheduled rollback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants