Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 120 additions & 15 deletions docs/modules/ROOT/pages/list-of-metrics.adoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
= List of Hazelcast Metrics
[[appendix]]

The table below lists the metrics and their explanations, grouped by their relevant subjects.
The tables below list Hazelcast metrics and descriptions, grouped by subject.
This page includes both member-side and client-side metrics.

The metrics are collected per member and are specific to the local member from which
they were collected. For example, distributed data structure metrics
reflect the local statistics of that data structure for the portion
held in that member.
Unless noted otherwise, metrics are member-side and local to the instance that reports them.
For members, distributed data structure metrics reflect the local statistics
for the partition data held on that member.

Some metrics may store cluster-wide agreed values — that is, they may show the values obtained
by communicating with other members in the cluster. This type of
metric reflects the member's local view of the cluster (consider split-brain scenarios). The `clusterStartTime` is an example of this type of
metric, and its value in the local member is obtained by communicating
with the master member.
Some metrics represent cluster-wide agreed values obtained by communicating
with other members in the cluster. In those cases, each value still reflects
the local member's current view of the cluster (for example, in split-brain
scenarios). The `clusterStartTime` metric is one example; on a local member,
its value is obtained from the master member.

Metric names on this page are canonical names. Some metrics are exposed with
different prefixes depending on where they are collected (member, client, or
subsystem). Sections that have prefix variants call them out explicitly.

NOTE: If you use Management Center to export cluster-wide metrics to Prometheus, Management Center reformats the metrics to align with Prometheus best practice recommendations. See xref:{page-latest-supported-mc}@management-center:integrate:prometheus-monitoring.adoc[Prometheus Monitoring].

Expand All @@ -25,9 +29,9 @@ NOTE: If you use Management Center to export cluster-wide metrics to Prometheus,

|blockingWorkerCount
|Number of non-cooperative workers employed.
.6+|_none_
.7+|_none_

Each Hazelcast member will have one instance of this metric.
Each Hazelcast member will have one instance of each of these metrics.

|jobs.submitted
|Number of computational jobs submitted.
Expand All @@ -40,14 +44,17 @@ Each Hazelcast member will have one instance of this metric.

|jobs.executionStarted
|Number of computational job executions started. Each job can
execute multiple times, for example when its restarted or
execute multiple times, for example when it's restarted or
suspended and then resumed.

|jobs.executionTerminated
|Number of computational job executions finished. Each job can
execute multiple times, for example when its restarted or
execute multiple times, for example when it's restarted or
suspended and then resumed.

|jobs.executionCompleted
|Number of computational job executions completed (successfully or otherwise).

|iterationCount
|The total number of iterations the driver of tasklets in
cooperative thread N made. It should increase by at least 250
Expand All @@ -57,9 +64,10 @@ if there are many tasklets assigned to the processor. Lower
value affects the latency.
.2+|_cooperativeWorker_

Each Hazelcast member will have one of this metric for each of its
Each Hazelcast member will have one instance of each of these metrics for each of its
cooperative worker threads.


|taskletCount
|The number of assigned tasklets to cooperative thread N.

Expand Down Expand Up @@ -284,6 +292,11 @@ The Reset column shows the reset behavior of the metrics. There are two types of
|Number of updated but not persisted yet entries, dirty entries, that the member owns
| N

|`map.entrySetCount`
|count
|Number of entry set operations on this member
| N

|`map.evictionCount`
|count
|Number of evictions happened on locally owned entries, backups are not included
Expand Down Expand Up @@ -374,6 +387,11 @@ The Reset column shows the reset behavior of the metrics. There are two types of
|Number of queries executed on the map (it may be imprecise for queries involving partition predicates (PartitionPredicate) on the off-heap storage)
| N

|`map.queryLimiterHitCount`
|count
|Number of times the query result size limiter was hit on this member
| N

|`map.removeCount`
|count
|Number of local remove operations on the map
Expand Down Expand Up @@ -424,6 +442,16 @@ The Reset column shows the reset behavior of the metrics. There are two types of
|Total latency of local set operations on the map
| N

|`map.valuesCount`
|count
|Number of values operations on this member
| N

|`map.store.offloaded.operations.waitingToBeProcessedCount`
|count
|Number of offloaded map store operations waiting to be processed
| N

4+a|
The above `*latency` metrics are only measured for the members and they do not represent the overall performance of the cluster.
Hazelcast recommends monitoring the average latency for each operation, for example, `map.totalGetLatency` / `map.getCount` and `map.totalSetLatency` / `map.setCount`.
Expand Down Expand Up @@ -748,6 +776,18 @@ This is because the cluster has to communicate with more members, which can add
|count
|Number of dirty (updated but not persisted yet) entries that the member owns

|`multiMap.entrySetCount`
|count
|Number of entry set operations on this member

|`multiMap.evictionCount`
|count
|Number of evictions completed on locally owned entries, backups are not included

|`multiMap.expirationCount`
|count
|Number of expirations completed on locally owned entries, backups are not included

|`multiMap.getCount`
|count
|Number of local get operations on the multimap
Expand All @@ -764,6 +804,14 @@ This is because the cluster has to communicate with more members, which can add
|count
|Total number of indexed local queries performed on the multimap

|`multiMap.indexesSkippedQueryCount`
|count
|Total number of local queries performed on the multimap which cannot use indexes

|`multiMap.noMatchingIndexQueryCount`
|count
|Total number of local queries performed on the multimap which had no matching index

|`multiMap.lastAccessTime`
|ms
|Last access (read) time of the locally owned entries
Expand Down Expand Up @@ -804,6 +852,10 @@ This is because the cluster has to communicate with more members, which can add
|count
|Number of local queries executed on the multimap (it may be imprecise for queries involving partition predicates (PartitionPredicate) on the off-heap storage)

|`multiMap.queryLimiterHitCount`
|count
|Number of times the query result size limiter was hit on this member

|`multiMap.removeCount`
|count
|Number of local remove operations on the multimap
Expand Down Expand Up @@ -843,6 +895,10 @@ This is because the cluster has to communicate with more members, which can add
|`multiMap.totalSetLatency`
|ms
|Total latency of local set operations

|`multiMap.valuesCount`
|count
|Number of values operations on this member
|===
====

Expand All @@ -859,6 +915,10 @@ This is because the cluster has to communicate with more members, which can add
|ms
|Creation time of this replicated map on this member

|`replicatedMap.entrySetCount`
|count
|Number of entry set operations on this member

|`replicatedMap.getCount`
|count
|Number of get operations on this member
Expand Down Expand Up @@ -926,6 +986,10 @@ This is because the cluster has to communicate with more members, which can add
|`replicatedMap.total`
|count
|Total number of operations on this member

|`replicatedMap.valuesCount`
|count
|Number of values operations on this member
|===
====

Expand Down Expand Up @@ -1349,6 +1413,8 @@ This is because the cluster has to communicate with more members, which can add
.Clients
[%collapsible]
====
Scope: member-side only. These metrics describe clients connected to the local member.

[cols="4,1,6a"]
|===
| Name
Expand All @@ -1368,6 +1434,8 @@ This is because the cluster has to communicate with more members, which can add
.Client Invocations
[%collapsible]
====
Scope: client-side only.

[cols="4,1,6a"]
|===
| Name
Expand Down Expand Up @@ -1536,6 +1604,8 @@ We also have a `summary` section per object type which provides live and destroy
.Listeners
[%collapsible]
====
Scope: client-side only.

[cols="4,1,6a"]
|===
| Name
Expand Down Expand Up @@ -1614,6 +1684,18 @@ We also have a `summary` section per object type which provides live and destroy
.Memory
[%collapsible]
====
The names in this table use canonical `memory.` metric names.

Prefix and scope variants:

* Open Source member and client instance memory metrics: `memory.`
* Enterprise member aggregate HD memory metrics: `memory.` or `memorymanager.`
* Enterprise member per-memory-manager metrics: `memorymanager.stats.`

The same metric values can be exposed under multiple prefixes.
Depending on the configured memory allocation type, `memorymanager.stats.`
values can overlap with `memorymanager.` values.

[cols="4,1,6a"]
|===
| Name
Expand Down Expand Up @@ -1673,6 +1755,11 @@ We also have a `summary` section per object type which provides live and destroy
.High-Density (HD) Memory Store
[%collapsible]
====
Scope: member-side only. {enterprise-product-name} only.

These metrics are provided for data structures using HD memory. They will be
prefixed with the relevant data structure prefix (e.g. `map.`).

[cols="4,1,6a"]
|===
| Name
Expand Down Expand Up @@ -1700,6 +1787,12 @@ We also have a `summary` section per object type which provides live and destroy
.Near Cache
[%collapsible]
====
Scope: member-side and client-side.

The names in this table use the client prefix form (`nearcache.`). For
member-side Near Cache metrics, use `map.nearcache.` with the same suffix.
The same metric values and semantics apply on both sides.

[cols="4,1,6a"]
|===
| Name
Expand Down Expand Up @@ -1785,6 +1878,10 @@ The **normal** operations are the ones that manipulate the data, for example `ma
|count
|Number of current executing async operations on the operation service of the member

|`operation.callTimeoutCount`
|count
|Number of operation call timeouts on the member

|`operation.completedCount`
|count
|Number of completed operations
Expand Down Expand Up @@ -2463,6 +2560,10 @@ Based on your latency tolerance in your business use case, you can define a thre
|count
|Number of times that I/O exceptions are thrown during selection

|`tcp.inputThread/outputThread.selectorRebuildCount`
|count
|Number of times the selector was recreated on this NioThread

|`tcp.inputThread/outputThread.taskQueueSize`
|count
|Number of pending tasks on the queue of NioThread
Expand Down Expand Up @@ -2576,6 +2677,10 @@ Based on your latency tolerance in your business use case, you can define a thre
|count
|Total number of WAN events currently placed in the WAN queues of primary partitions on this member

|`wan.queueFillPercent`
|percent
|Percentage of the WAN replication queue that is filled

|`wan.removeCount`
|count
|Number of entry remove events
Expand Down
Loading