Skip to content

[BUG] OpenSearch crashes due to max map count being exceeded #18226

Open
@justinborromeo-glean

Description

@justinborromeo-glean

Describe the bug

After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1, nodes on our largest OpenSearch clusters started crashing with the following error:

There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 2097152 bytes. Error detail: AllocateHeap

Heap memory usage is normal and kubernetes pod memory usage is well within limits.

We narrowed down the issue to the vm max map count (262144) being reached. Prior to server crash, we see map count (measured by cat /proc/{pid}/maps | wc -l) approach the 262144 limit we set. Looking at one of the outputs of cat /proc/{pid}/maps, we observed that 246K of the 252K maps are for deleted doc values (.dvd) files.

A whole lot of lines like this:

7ed8fcb88000-7ed8fcb8a000 r--s 00000000 08:10 80635920                   /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/98/index/_9cu3_38_Lucene90_0.dvd (deleted)
7ed8fcb8a000-7ed8fcb8e000 r--s 00000000 08:10 78916713                   /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/203/index/_99uu_g3_Lucene90_0.dvd (deleted)
7ed8fcb8e000-7ed8fcb90000 r--s 00000000 08:10 80626366                   /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/161/index/_9ewb_4q_Lucene90_0.dvd (deleted)

Is this expected? If so, were there any changes in the Lucene oro OS codebase between those two versions that could have caused this? Any suggestions on debugging?

Related component

Storage

To Reproduce

We're unable to reproduce on-demand. We see this occur after large clusters with high indexing activity have been running for a couple days

Expected behavior

No server crash, mmap count stays within bounds.

Additional Details

Plugins
gcs-repository, custom scoring plugin, custom codecs, cross-cluster-replication

Host/Environment (please complete the following information):

  • OS: Linux
  • Version: 2.19.1
  • Java:
openjdk version "21.0.6" 2025-01-21 LTS
OpenJDK Runtime Environment Temurin-21.0.6+7 (build 21.0.6+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.6+7 (build 21.0.6+7-LTS, mixed mode, sharing)

Metadata

Metadata

Assignees

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingSearchSearch query, autocomplete ...etcStorageIssues and PRs relating to data and metadata storagebugSomething isn't workinguntriaged

Type

No type

Projects

Status

🆕 New

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions