Is there an existing issue for this?
Task description
There are some gnarly sounding bugs fixed in Lucene 4.9.0 through 4.10.4 (the last in the 4.x line), after 4.8.1 that we target. We should review these to determine if they're worth backporting.
Because these were all in the 4.x line, they should mostly (all?) be backport-able, but I haven't yet looked at the details. This was just a realization during the investigation for #1284, where a significant bug was fixed in 4.10.0.
Below are some sampled bug fixes in the change log that sound... potentially important... IMO. Others might be bad but just don't sound as bad.
Also note that there also could be backport-able, critical fixes in 5.0+, but those might be harder to determine backport-ability. If someone wants to take that up, please file a new issue.
4.10.4:
LUCENE-6279: Don't let an abusive leftover _N_upgraded.si in the index directory cause index corruption on upgrade
LUCENE-6287: Fix concurrency bug in IndexWriter that could cause index corruption (missing _N.si files) the first time 4.x kisses a 3.x index if merges are also running.
LUCENE-6214: Fixed IndexWriter deadlock when one thread is committing while another opens a near-real-time reader and an unrecoverable (tragic) exception is hit.
4.10.3:
LUCENE-6046: Add maxDeterminizedStates safety to determinize (which has an exponential worst case) so that if it would create too many states, it now throws an exception instead of exhausting CPU/RAM.
4.10.2:
LUCENE-5977: Fix tokenstream safety checks in IndexWriter to properly work across multi-valued fields. Previously some cases across multi-valued fields would happily create a corrupt index.
4.10.1:
LUCENE-5958: Don't let exceptions during checkpoint corrupt the index. Refactor existing OOM handling too, so you don't need to handle OOM special for every IndexWriter method: instead such disasters will cause IW to close itself defensively.
LUCENE-5904: Fixed a corruption case that can happen when 1) IndexWriter is uncleanly shut-down (OS crash, power loss, etc.), 2) on startup, when a new IndexWriter is created, a virus checker is holding some of the previously written but unused files open and preventing deletion, 3) IndexWriter writes these files again during the course of indexing, then the files can later be deleted, causing corruption. This case was detected by adding evilness to MockDirectoryWrapper to have it simulate a virus checker holding a file open and preventing deletion
LUCENE-5975: Fix reading of 3.0-3.3 indexes, where bugs in these old index formats would result in CorruptIndexException "did not read all bytes from file" when reading the deleted docs file.
4.10.0:
LUCENE-5790: Fix compareTo in MutableValueDouble and MutableValueBool, this caused incorrect results when grouping on fields with missing values.
4.9.1:
LUCENE-5907: Fix corruption case when opening a pre-4.x index with IndexWriter, then opening an NRT reader from that writer, then calling commit from the writer, then closing the NRT reader. This case would remove the wrong files from the index leading to a corrupt index.
LUCENE-5919: Fix exception handling inside IndexWriter when deleteFile throws an exception, to not over-decRef index files, possibly deleting a file that's still in use in the index, leading to corruption.
LUCENE-5843: Added IndexWriter.MAX_DOCS which is the maximum number of documents allowed in a single index, and any operations that add documents will now throw IllegalStateException if the max count would be exceeded, instead of silently creating an unusable index.
4.9.0:
LUCENE-5738: Ensure NativeFSLock prevents opening the file channel for the lock if the lock is already obtained by the JVM. Trying to obtain an already obtained lock in the same JVM can unlock the file might allow other processes to lock the file even without explicitly unlocking the FileLock. This behavior is operating system dependent.
LUCENE-5691: DocTermOrds lookupTerm(BytesRef) would return incorrect results if the underlying TermsEnum supports ord() and the insertion point would be at the end.
Is there an existing issue for this?
Task description
There are some gnarly sounding bugs fixed in Lucene 4.9.0 through 4.10.4 (the last in the 4.x line), after 4.8.1 that we target. We should review these to determine if they're worth backporting.
Because these were all in the 4.x line, they should mostly (all?) be backport-able, but I haven't yet looked at the details. This was just a realization during the investigation for #1284, where a significant bug was fixed in 4.10.0.
Below are some sampled bug fixes in the change log that sound... potentially important... IMO. Others might be bad but just don't sound as bad.
Also note that there also could be backport-able, critical fixes in 5.0+, but those might be harder to determine backport-ability. If someone wants to take that up, please file a new issue.
4.10.4:
4.10.3:
4.10.2:
4.10.1:
4.10.0:
4.9.1:
4.9.0: