Open
Description
Elasticsearch Version
7.10.1
Java Version
jdk11
OS Version
4.14.81.bm.29-amd64 #1 SMP Debian 4.14.81.bm.29
Problem Description
In our product, there are two cases showing that the file-system corruption, but the shard is green , which seems abnormal.
Case1
Server log: It lasted for several days to throw follow exception:
[2023-12-30T10:00:09,017][WARN ][o.e.i.w.ShardIndexWarmerService] [data0] [index][114] failed to load bitset for [DocValuesFieldExistsQuery [field=_primary_term]]
java.util.concurrent.ExecutionException: java.io.EOFException: read past EOF: MMapIndexInput(path="/data/nodes/0/indices/PFK_L3OGRBicJkxAcdDw1g/114/index/_x8jv.cfs") [slice=_x8jv_Lucene80_0.dvd] [slice=docs]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:436) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.cache.bitset.BitsetFilterCache.getAndLoadIfNotPresent(BitsetFilterCache.java:148) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.cache.bitset.BitsetFilterCache.access$000(BitsetFilterCache.java:74) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.cache.bitset.BitsetFilterCache$BitSetProducerWarmer.lambda$warmReader$1(BitsetFilterCache.java:265) [elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/data/nodes/0/indices/PFK_L3OGRBicJkxAcdDw1g/114/index/_x8jv.cfs") [slice=_x8jv_Lucene80_0.dvd] [slice=docs]
at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:85) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.store.DataInput.readShort(DataInput.java:95) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.store.ByteBufferIndexInput.readShort(ByteBufferIndexInput.java:163) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.codecs.lucene80.IndexedDISI$Method$1.advanceWithinBlock(IndexedDISI.java:478) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.codecs.lucene80.IndexedDISI.advance(IndexedDISI.java:389) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.codecs.lucene80.IndexedDISI.nextDoc(IndexedDISI.java:459) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene80DocValuesProducer.java:444) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.util.BitSet.or(BitSet.java:95) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:271) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.apache.lucene.util.BitSet.of(BitSet.java:41) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
at org.elasticsearch.index.cache.bitset.BitsetFilterCache.bitsetFromQuery(BitsetFilterCache.java:103) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.index.cache.bitset.BitsetFilterCache.lambda$getAndLoadIfNotPresent$1(BitsetFilterCache.java:149) ~[elasticsearch-7.10.2.jar:7.10.2]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-7.10.2.jar:7.10.2]
... 7 more
We just print a warn log no matter what the exception is:
https://github.com/elastic/elasticsearch/blob/1c34507e66d7db1211f66f3513706fdf548736aa/server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java#L270C5-L270C5
If we should distinguish exceptions: if there is an IOException, the shard should be failed.
Case2
Client log:
Caused by: [index0/data0][[index0][91]] ElasticsearchException[Elasticsearch exception [type=engine_exception, reason=Couldn't resolve version]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_state_exception, reason=document [0] does not have docValues for [_primary_term]]];
at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:139)
at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1699)
at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1781)
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:636)
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:376)
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:370)
It throws the EngineException inner, but doesn't process the exception outside:
If we should fail the shard here when throwing EngineException.