Description
Investigating a master-node OOME in 8.17 I discovered over 1.2GiB of heap (out of an available 2GiB) consumed by just under 600 stored thread contexts:

(NB sorting by retained heap so the 580 instances off the top of this screenshot are each responsible for retaining at least 2241696B)
It's basically all in the _authz_info
transient header:
Class Name | Shallow Heap | Retained Heap
---------------------------------------------------------------------------------------------------------------------------------------
org.elasticsearch.common.util.concurrent.ThreadContext$ThreadContextStruct @ 0x9bace400 | 40 | 2,491,944
|- transientHeaders java.util.HashMap @ 0x9bace4e8 | 48 | 2,491,712
| |- table java.util.HashMap$Node[8] @ 0x9bace518 | 48 | 2,491,648
| | |- [6] java.util.HashMap$Node @ 0x9bace568 | 32 | 2,422,632
| | | |- value org.elasticsearch.xpack.security.authz.RBACEngine$RBACAuthorizationInfo @ 0x9bace588 | 24 | 2,422,600
| | | |- key java.lang.String @ 0x8e1d16b0 _authz_info | 24 | 56
| | | |- <class> class java.util.HashMap$Node @ 0x89dc8d98 System Class | 8 | 32
| | | '- Total: 3 entries | |
| | |- [1] java.util.HashMap$Node @ 0x94ea3f98 | 32 | 272
| | |- [2] java.util.HashMap$Node @ 0x94ea3f78 | 32 | 32
| | |- [7] java.util.HashMap$Node @ 0x9bace548 | 32 | 32
| | |- class java.util.HashMap$Node[] @ 0x89dc8e20 | 0 | 0
| | '- Total: 5 entries | |
| |- <class> class java.util.HashMap @ 0x8690be10 System Class | 40 | 80
| |- entrySet java.util.HashMap$EntrySet @ 0x94ea40a8 | 16 | 16
| '- Total: 3 entries | |
|- <class> class org.elasticsearch.common.util.concurrent.ThreadContext$ThreadContextStruct @ 0x8a5f3d50| 16 | 1,376
|- requestHeaders java.util.HashMap @ 0x9bace428 | 48 | 192
|- responseHeaders java.util.Collections$EmptyMap @ 0x89d2e040 | 24 | 24
'- Total: 4 entries | |
---------------------------------------------------------------------------------------------------------------------------------------
The bulk of this looks to relate to the kibana-.kibana
application privilege with over 3k patterns and many hundreds of kiB burned on the transitions in the associated automaton.
Class Name | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.elasticsearch.xpack.security.authz.RBACEngine$RBACAuthorizationInfo @ 0x9bace588 | 24 | 2,422,600
|- <class> class org.elasticsearch.xpack.security.authz.RBACEngine$RBACAuthorizationInfo @ 0x8fc46db8 | 8 | 32
|- info java.util.Collections$SingletonMap @ 0x94ea3f10 | 40 | 40
|- authenticatedUserAuthorizationInfo org.elasticsearch.xpack.security.authz.RBACEngine$RBACAuthorizationInfo @ 0x94ea3f38 | 24 | 64
|- role org.elasticsearch.xpack.core.security.authz.permission.SimpleRole @ 0x9bace5a0 | 48 | 2,421,984
| |- <class> class org.elasticsearch.xpack.core.security.authz.permission.SimpleRole @ 0x8b10fa10 | 16 | 104
| |- workflowsRestriction org.elasticsearch.xpack.core.security.authz.restriction.WorkflowsRestriction @ 0x8c395e30 | 24 | 24
| |- runAs org.elasticsearch.xpack.core.security.authz.permission.RunAsPermission @ 0x94ea0df8 | 24 | 224
| |- remoteIndicesPermission org.elasticsearch.xpack.core.security.authz.permission.RemoteIndicesPermission @ 0x94ea0ed8 | 16 | 12,232
| |- remoteClusterPermissions org.elasticsearch.xpack.core.security.authz.permission.RemoteClusterPermissions @ 0x94ea3ea0 | 16 | 96
| |- hasPrivilegesCacheReference java.util.concurrent.atomic.AtomicReference @ 0x94ea3f00 | 16 | 16
| |- names java.lang.String[9] @ 0x9bace5d0 | 56 | 488
| |- cluster org.elasticsearch.xpack.core.security.authz.permission.ClusterPermission @ 0x9bace608 | 24 | 21,112
| |- indices org.elasticsearch.xpack.core.security.authz.permission.IndicesPermission @ 0x9bad3880 | 32 | 68,632
| |- application org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission @ 0x9bae2800 | 16 | 2,388,256
| | |- <class> class org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission @ 0x8b10db38 | 16 | 792
| | |- permissions java.util.ImmutableCollections$ListN @ 0x9bae2810 | 24 | 2,388,240
| | | |- <class> class java.util.ImmutableCollections$ListN @ 0x89dca118 System Class | 8 | 32
| | | |- elements java.lang.Object[9] @ 0x9bae2828 | 56 | 2,388,216
| | | | |- class java.lang.Object[] @ 0x910517b0 | 0 | 0
| | | | |- [8] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bae2860 | 32 | 234,944
| | | | |- [7] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb1bd58 | 32 | 1,744
| | | | |- [6] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb1c5a8 | 32 | 88
| | | | |- [5] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb1c780 | 32 | 109,280
| | | | |- [4] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb44ae8 | 32 | 916,608
| | | | |- [3] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb89b78 | 32 | 2,856
| | | | |- [2] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb8a6e0 | 32 | 2,168
| | | | |- [1] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bb8af58 | 32 | 219,672
| | | | |- [0] org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x9bbc7118 | 32 | 899,976
| | | | | |- <class> class org.elasticsearch.xpack.core.security.authz.permission.ApplicationPermission$PermissionEntry @ 0x8b00fe80| 8 | 40
| | | | | |- application org.elasticsearch.xpack.core.security.support.Automatons$1 @ 0x94ea02e8 | 24 | 2,080
| | | | | | |- <class> class org.elasticsearch.xpack.core.security.support.Automatons$1 @ 0x8aeb4948 | 0 | 0
| | | | | | |- val$runAutomaton org.apache.lucene.util.automaton.CharacterRunAutomaton @ 0x94ea0300 | 40 | 2,000
| | | | | | |- val$toString java.lang.String @ 0x94ea0ad0 kibana-.kibana | 24 | 56
| | | | | | '- Total: 3 entries | |
| | | | | |- resourceNames java.util.HashSet @ 0x94ea0b08 | 16 | 320
| | | | | |- resourceAutomaton org.apache.lucene.util.automaton.Automaton @ 0x94ea0bd8 | 48 | 544
| | | | | |- privilege org.elasticsearch.xpack.core.security.authz.privilege.ApplicationPrivilege @ 0x9bbc7138 | 32 | 897,000
| | | | | | |- <class> class org.elasticsearch.xpack.core.security.authz.privilege.ApplicationPrivilege @ 0x8b01aa60 | 24 | 4,312
| | | | | | |- application java.lang.String @ 0x96f28dc0 kibana-.kibana | 24 | 56
| | | | | | |- automaton org.apache.lucene.util.automaton.Automaton @ 0x98343500 | 48 | 84,792
| | | | | | |- name java.util.HashSet @ 0x9bbc7158 | 16 | 248
| | | | | | |- predicate org.elasticsearch.xpack.core.security.support.Automatons$1 @ 0x9bbc7218 | 24 | 884,616
| | | | | | |- patterns java.lang.String[3008] @ 0x9bbc79c8 | 12,048 | 12,048
| | | | | | '- Total: 6 entries | |
| | | | | '- Total: 5 entries | |
| | | | '- Total: 10 entries | |
| | | '- Total: 2 entries | |
| | '- Total: 2 entries | |
| '- Total: 10 entries | |
'- Total: 4 entries | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This sure seems like something with room for improvement. Could we for instance share these automata across contexts better? Does it really make sense to build an automaton for matching such a large pattern list (where none of the patterns includes any wildcards) rather than say using HashSet#contains
. Do we really need such a large pattern list anyway?
Even if we can't obviously optimize this right away, a bigger problem IMO is that we are not tracking this memory usage anywhere so it's invisible and there's no backpressure or load-shedding or autoscaling if it gets too much.