Description
Describe the bug
After upgrading our OpenSearch cluster from 2.16.0 to 2.19.1, nodes on our largest OpenSearch clusters started crashing with the following error:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 2097152 bytes. Error detail: AllocateHeap
Heap memory usage is normal and kubernetes pod memory usage is well within limits.
We narrowed down the issue to the vm max map count (262144) being reached. Prior to server crash, we see map count (measured by cat /proc/{pid}/maps | wc -l
) approach the 262144 limit we set. Looking at one of the outputs of cat /proc/{pid}/maps
, we observed that 246K of the 252K maps are for deleted doc values (.dvd) files.
A whole lot of lines like this:
7ed8fcb88000-7ed8fcb8a000 r--s 00000000 08:10 80635920 /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/98/index/_9cu3_38_Lucene90_0.dvd (deleted)
7ed8fcb8a000-7ed8fcb8e000 r--s 00000000 08:10 78916713 /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/203/index/_99uu_g3_Lucene90_0.dvd (deleted)
7ed8fcb8e000-7ed8fcb90000 r--s 00000000 08:10 80626366 /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/161/index/_9ewb_4q_Lucene90_0.dvd (deleted)
Is this expected? If so, were there any changes in the Lucene oro OS codebase between those two versions that could have caused this? Any suggestions on debugging?
Related component
Storage
To Reproduce
We're unable to reproduce on-demand. We see this occur after large clusters with high indexing activity have been running for a couple days
Expected behavior
No server crash, mmap count stays within bounds.
Additional Details
Plugins
gcs-repository, custom scoring plugin, custom codecs, cross-cluster-replication
Host/Environment (please complete the following information):
- OS: Linux
- Version: 2.19.1
- Java:
openjdk version "21.0.6" 2025-01-21 LTS
OpenJDK Runtime Environment Temurin-21.0.6+7 (build 21.0.6+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.6+7 (build 21.0.6+7-LTS, mixed mode, sharing)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status