Skip to content

It lasted for several days to throw the exception: failed to load bitset  #103840

Open
@kkewwei

Description

@kkewwei

Elasticsearch Version

7.10.1

Java Version

jdk11

OS Version

4.14.81.bm.29-amd64 #1 SMP Debian 4.14.81.bm.29

Problem Description

In our product, there are two cases showing that the file-system corruption, but the shard is green , which seems abnormal.

Case1
Server log: It lasted for several days to throw follow exception:

[2023-12-30T10:00:09,017][WARN ][o.e.i.w.ShardIndexWarmerService] [data0] [index][114] failed to load bitset for [DocValuesFieldExistsQuery [field=_primary_term]]
java.util.concurrent.ExecutionException: java.io.EOFException: read past EOF: MMapIndexInput(path="/data/nodes/0/indices/PFK_L3OGRBicJkxAcdDw1g/114/index/_x8jv.cfs") [slice=_x8jv_Lucene80_0.dvd] [slice=docs]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:436) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.getAndLoadIfNotPresent(BitsetFilterCache.java:148) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.access$000(BitsetFilterCache.java:74) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache$BitSetProducerWarmer.lambda$warmReader$1(BitsetFilterCache.java:265) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.EOFException: read past EOF: MMapIndexInput(path="/data/nodes/0/indices/PFK_L3OGRBicJkxAcdDw1g/114/index/_x8jv.cfs") [slice=_x8jv_Lucene80_0.dvd] [slice=docs]
        at org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:85) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.store.DataInput.readShort(DataInput.java:95) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.store.ByteBufferIndexInput.readShort(ByteBufferIndexInput.java:163) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.IndexedDISI$Method$1.advanceWithinBlock(IndexedDISI.java:478) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.IndexedDISI.advance(IndexedDISI.java:389) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.IndexedDISI.nextDoc(IndexedDISI.java:459) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$SparseNumericDocValues.nextDoc(Lucene80DocValuesProducer.java:444) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.util.BitSet.or(BitSet.java:95) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:271) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.apache.lucene.util.BitSet.of(BitSet.java:41) ~[lucene-core-8.7.0.jar:8.7.0-SNAPSHOT b324c8f6ecb8a04a66d7b52d52d18a664cbf1ab4 - root - 2022-03-29 20:56:53]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.bitsetFromQuery(BitsetFilterCache.java:103) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.index.cache.bitset.BitsetFilterCache.lambda$getAndLoadIfNotPresent$1(BitsetFilterCache.java:149) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433) ~[elasticsearch-7.10.2.jar:7.10.2]
        ... 7 more

We just print a warn log no matter what the exception is:
https://github.com/elastic/elasticsearch/blob/1c34507e66d7db1211f66f3513706fdf548736aa/server/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java#L270C5-L270C5

If we should distinguish exceptions: if there is an IOException, the shard should be failed.

Case2
Client log:

Caused by: [index0/data0][[index0][91]] ElasticsearchException[Elasticsearch exception [type=engine_exception, reason=Couldn't resolve version]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_state_exception, reason=document [0] does not have docValues for [_primary_term]]];
        at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
        at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
        at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:139)
        at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
        at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1699)
        at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1781)
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:636)
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:376)
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:370)

It throws the EngineException inner, but doesn't process the exception outside:

return getFromSearcher(get, searcherFactory, scope);

If we should fail the shard here when throwing EngineException.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/EngineAnything around managing Lucene and the Translog in an open shard.>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions