Is there an existing issue for this?
Describe the bug
Split out from #1295. This is confirmed to be a platform- and target-agnostic race bug, not anything specific to net10.0. Although it is possible that it is surfacing more now on net10.0 for undeterminable framework reasons.
Original test failure report:
Expected: True Actual: False
(Test: Lucene.Net.Index.TestTransactions.TestTransactions_Mem)
To reproduce this test result:
Option 1:
Apply the following assembly-level attributes:
[assembly: Lucene.Net.Util.RandomSeed("0x05988a4671cb8d53:0x175a1893dc7e9151")]
[assembly: NUnit.Framework.SetCulture("ff-Latn-SN")]
Option 2:
Use the following .runsettings file:
<RunSettings>
<TestRunParameters>
<Parameter name="tests:seed" value="0x05988a4671cb8d53:0x175a1893dc7e9151" />
<Parameter name="tests:culture" value="ff-Latn-SN" />
</TestRunParameters>
</RunSettings>
Option 3:
Create the following lucene.testsettings.json file somewhere between the test assembly and the root of your drive:
{
"tests": {
"seed": "0x05988a4671cb8d53:0x175a1893dc7e9151",
"culture": "ff-Latn-SN"
}
}
Fixture Test Values
Random Seed: 0x05988a4671cb8d53:0x175a1893dc7e9151
Culture: ff-Latn-SN
Time Zone: (UTC-05:00) Eastern Time (Port-au-Prince)
Default Codec: Lucene46 (RandomCodec)
Default Similarity: DefaultSimilarity
System Properties
Nightly: False
Weekly: False
Slow: True
Awaits Fix: False
Directory: random
Verbose: False
Random Multiplier: 1
In TestTransactions.IndexerThread.DoWork, it catches any exceptions that occur in PrepareCommit, and if so, rolls back the writers, then returns. This is because it doesn't care about the actual details of the exception, just that the transactional protocol works correctly in the presence of random I/O failures.
However, it does not catch any exceptions thrown by Commit. When forced to throw exceptions in Commit, this test failure can be reproduced. By adding try/catch around the Commit call, like in the PrepareCommit case before it, the artificially-forced failure test is fixed.
This appears to simply be a bug (or perhaps a limitation, to put it milder) in the test code, and the same limitation exists in the Java code. They likely might have occasionally run into this failure too.
Is this .NET 10 related? It does not appear to be. By forcing failure in Commit, the test failure can be reliably reproduced on .NET 8-10 (did not try .NET Framework yet). Likewise, several hours of repeated, focused test runs of this test did not show any failures, so it is not easily reproducible as-is. It is always possible that performance differences in new framework versions can cause races to appear more or less frequently, nondeterministically.
Why is it rare? For this scenario to happen, the following things have to be true:
- The first PrepareCommit call has to succeed. Given the many calls it makes where it can randomly fail, this percentage is very low. A rough estimate from tracing the logic is that this happens about 0.01% of the time just based on purposefully-thrown exceptions alone.
- The second PrepareCommit call has to succeed. Square the probability of item 1.
- One of the two Commit calls has to throw. This is also not guaranteed, but more likely than not if you get to this point.
In 500 repeated runs of the test on .NET 10 (macOS, arm64), with instrumentation added about how often each threw, the results are striking:
- PrepareCommit call 1 threw 1531 times (100% of the time)
- PrepareCommit call 2 threw 0 times (did not get there)
- Commit threw 0 times (did not get there)
Solution: We should catch and swallow exceptions in Commit for this test and roll back, since that is not the functionality under test. In fact, the functionality under test is precisely expecting that exceptions do happen. Expecting them not to happen is not the goal of the test. We should do the same behavior as if a call to PrepareCommit fails.
Aside: It arguably is a poorly-designed test if PrepareCommit throws roughly 100% of the time on the first call. If that is the case, it probably should just be set to throw all the time, no matter what, and not even try a second PrepareCommit or Commit step. But a better solution might be, we could configure this test to throw random exceptions less often. That would let it more properly exercise the transactional behavior in different scenarios of failure AND success, and then you might have different doc counts to assert, if it can get through to Commit successfully from time to time. Currently, in the very rare scenario where it gets past all 4 calls and succeeds, we don't know about it if that happens. Regardless, we would still need the catch around Commit, since it is expected to fail if it gets to it.
Expected Behavior
No response
Steps To Reproduce
No response
Exceptions (if any)
No response
Lucene.NET Version
No response
.NET Version
No response
Operating System
No response
Anything else?
No response
Is there an existing issue for this?
Describe the bug
Split out from #1295. This is confirmed to be a platform- and target-agnostic race bug, not anything specific to net10.0. Although it is possible that it is surfacing more now on net10.0 for undeterminable framework reasons.
Original test failure report:
In
TestTransactions.IndexerThread.DoWork, it catches any exceptions that occur in PrepareCommit, and if so, rolls back the writers, then returns. This is because it doesn't care about the actual details of the exception, just that the transactional protocol works correctly in the presence of random I/O failures.However, it does not catch any exceptions thrown by Commit. When forced to throw exceptions in Commit, this test failure can be reproduced. By adding try/catch around the Commit call, like in the PrepareCommit case before it, the artificially-forced failure test is fixed.
This appears to simply be a bug (or perhaps a limitation, to put it milder) in the test code, and the same limitation exists in the Java code. They likely might have occasionally run into this failure too.
Is this .NET 10 related? It does not appear to be. By forcing failure in Commit, the test failure can be reliably reproduced on .NET 8-10 (did not try .NET Framework yet). Likewise, several hours of repeated, focused test runs of this test did not show any failures, so it is not easily reproducible as-is. It is always possible that performance differences in new framework versions can cause races to appear more or less frequently, nondeterministically.
Why is it rare? For this scenario to happen, the following things have to be true:
In 500 repeated runs of the test on .NET 10 (macOS, arm64), with instrumentation added about how often each threw, the results are striking:
Solution: We should catch and swallow exceptions in Commit for this test and roll back, since that is not the functionality under test. In fact, the functionality under test is precisely expecting that exceptions do happen. Expecting them not to happen is not the goal of the test. We should do the same behavior as if a call to PrepareCommit fails.
Aside: It arguably is a poorly-designed test if PrepareCommit throws roughly 100% of the time on the first call. If that is the case, it probably should just be set to throw all the time, no matter what, and not even try a second PrepareCommit or Commit step. But a better solution might be, we could configure this test to throw random exceptions less often. That would let it more properly exercise the transactional behavior in different scenarios of failure AND success, and then you might have different doc counts to assert, if it can get through to Commit successfully from time to time. Currently, in the very rare scenario where it gets past all 4 calls and succeeds, we don't know about it if that happens. Regardless, we would still need the catch around Commit, since it is expected to fail if it gets to it.
Expected Behavior
No response
Steps To Reproduce
No response
Exceptions (if any)
No response
Lucene.NET Version
No response
.NET Version
No response
Operating System
No response
Anything else?
No response