Skip to content

[BUG] Multiple Cluster state objects found in data node's heap snapshot for bulk request #13524

Open
@anshu1106

Description

@anshu1106

Describe the bug

While analyzing a heap dump taken on a domain with large no. nodes and 200k shards, it is found that out of 16.1 GB, ~14.7 GB in the retained heap is due to TransportResponseHandlers. The dump is from a data node and there were _bulk queries running in the domain at the time when heap dump was captured.
1

Expanding TransportResponseHandler
2

On expanding a ConcurrentHashMap object, it is found that TransportBulkAction$ConcreteIndices is taking ~63 MB. Most of which is taken by ClusterState.

The histogram below shows 215 ClusterState object taking ~11 GB of heap.
3

The incoming object reference for most of the ClusterState object is TransportBulkAction$ConcreteIndices.
4

There seem to be a bug in TransportBulkAction path which is creating new ClusterState objects rather than referencing one.

Related component

Indexing:Performance

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

There has to be atmost 2 cluster state objects in the domain when updates are going on. TransportBulkAction should not create new ClusterState objects.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions