hashicorp · aimeeu · Mar 24, 2025 · Mar 25, 2025 · Apr 14, 2025
@@ -2,7 +2,7 @@
 layout: docs
 page_title: Federate access to AWS with Nomad Workload Identity
 description: |-
-  Integrate Nomad as an OpenID Connect (OIDC) provider with AWS IAM identity and federate access to AWS resources.
+  Integrate Nomad as an OpenID Connect (OIDC) provider with AWS IAM identity and use workload identity to federate access to AWS resources and services.
 ---
 
 # Federate access to AWS with Nomad Workload Identity

@@ -1,11 +1,11 @@
 ---
 layout: docs
-page_title: Benchmarking Nomad
+page_title: Benchmark and load test Nomad
 description: |-
-  Load testing Nomad by utilizing the Nomad Bench project.
+  Use the Nomad Bench project to benchmark and load test Nomad servers.
 ---
 
-# Nomad Bench
+# Benchmark and load test Nomad
 
 The Nomad Bench project provides reusable infrastructure automation to run test scenarios in order
 to collect metrics and data from Nomad clusters running at scale. The core goal of the project is

@@ -1,10 +1,11 @@
 ---
 layout: docs
 page_title: Federated cluster failure scenarios
-description: Failure scenarios in multi-region federated cluster deployments.
+description: |-
+  Review failure scenarios in multi-region federated cluster deployments.  Learn which Nomad features continue to work under federated and authoritative region failures.
 ---
 
-# Failure scenarios
+# Federated cluster failure scenarios
 
 When running Nomad in federated mode, failure situations and impacts are different depending on
 whether the authoritative region is the impacted region or not, and what the failure mode is. In

@@ -2,7 +2,7 @@
 layout: docs
 page_title: Federated cluster operations
 description: |-
-  Operational considerations for running Nomad multi-region federated clusters as well as instructions for migrating the authoritative region to a federated region.
+  Review operational considerations for running Nomad multi-region federated clusters as well as instructions for migrating the authoritative region to a federated region.
 ---
 
 # Federated cluster operations

@@ -2,7 +2,7 @@
 layout: docs
 page_title: Operations
 description: |-
-  Learn about operating Nomad.
+  This section contains guides, explanatory content, and reference information for running Nomad in a production environment. Topics include stateful workloads, monitoring, benchmarking, key management, IPv6 support, federation, cluster management, access control, and transport security.
 ---
 
 # Operations

@@ -1,11 +1,11 @@
 ---
 layout: docs
-page_title: Support for IPv6
+page_title: IPv6 Support in Nomad
 description: |-
-  Nomad support for IPv6
+  Learn how Nomad supports IPv6. Configure Nomad to advertise IPv6 addresses. Link Nomad servers and clients that have specific IPv6 addresses. Set up Consul and Vault to use Nomad's IPv6 address. Learn how workload tasks and task drivers can use IPv6 addresses.
 ---
 
-# IPv6 support in Nomad
+# IPv6 Support in Nomad
 
 Nomad supports IPv6 as long as the underlying networks, host machines,
 and operating systems running it support IPv6.

@@ -1,10 +1,11 @@
 ---
 layout: docs
-page_title: Key Management
-description: Learn about the key management in Nomad.
+page_title: Key management
+description: |-
+  Learn how Nomad manages its keyring, which Nomad uses to encrypt variables, sign task workload identities, and sign OpenID Connect (OIDC) client assertions. Review key rotation, key decryption, and key redaction in Raft snapshots. Learn how Nomad v1.9+ can replicate keys from older Nomad versions.
 ---
 
-# Key Management
+# Key management
 
 Nomad servers maintain an encryption keyring used to encrypt [Variables][],
 sign task [workload identities][], and sign OIDC [client assertion JWTs][].
@@ -27,7 +28,7 @@ Under normal operations the keyring is entirely managed by Nomad, but this
 section provides administrators additional context around key replication and
 recovery.
 
-## Key Rotation
+## Key rotation
 
 Only one key in the keyring is "active" at any given time, and all encryption
 and signing operations happen on the leader. Nomad automatically rotates the
@@ -42,15 +43,15 @@ operator root keyring rotate -full`][]. A new "active" key will be created and
 re-encrypt all variables with the new key. As each key's variables are encrypted
 with the new key, the old key will marked as "deprecated".
 
-## Key Decryption
+## Key decryption
 
 When a leader is elected, the leader creates the keyring if it does not already
 exist. When a key is added, the new wrapped key material is replicated via
 Raft. As each server replicates the new key, the server starts a task to decrypt
 the key material. Until this task completes, the server is not able to serve
 requests that require this key.
 
-## Key Redaction in Raft Snapshots
+## Key redaction in Raft scenario snapshots
 
 The default AEAD `keyring` configuration stores the KEK in Raft. Raft snapshots
 contain the cleartext KEK. The `nomad operator snapshot save` command has a
@@ -60,7 +61,7 @@ existing snapshot.
 
 Redacting key material is not required when using an external KMS.
 
-## Legacy Keystore
+## Legacy keystore
 
 Versions of Nomad prior to 1.9.0 stored only key metadata in Raft, but the
 encryption key material was stored in a separate file in the `keystore`

@@ -1,10 +1,11 @@
 ---
 layout: docs
-page_title: Metrics Reference
-description: Learn about the different metrics available in Nomad.
+page_title: Metrics reference
+description: |-
+  This page contains reference information on the gauge, counter, and timer runtime metrics data that Nomad collects. Use the metrics endpoint to access the metrics data. Learn about the key metrics for monitoring your cluster. Review client, host, allocation, job summary, job status, server, Raft BoltDB, and agent metrics fields.
 ---
 
-# Metrics Reference
+# Metrics reference
 
 The Nomad agent collects various runtime metrics about the performance of
 different libraries and subsystems. These metrics are aggregated on a ten
@@ -67,22 +68,22 @@ Below is sample output of a telemetry dump:
 [2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204
 ```
 
-### Metric Types
+### Metric types
 
 | Type    | Description                                                                                                         | Quantiles |
 | ------- | ------------------------------------------------------------------------------------------------------------------- | --------- |
 | Gauge   | Gauge types report an absolute number at the end of the aggregation interval                                        | false     |
 | Counter | Counts are incremented and flushed at the end of the aggregation interval and then are reset to zero                | true      |
 | Timer   | Timers measure the time to complete a task and will include quantiles, means, standard deviation, etc per interval. | true      |
 
-### Tagged Metrics
+### Tagged metrics
 
 Nomad emits metrics in a tagged format. Each metric can support more than one
 tag, meaning that it is possible to do a match over metrics for datapoints
 such as a particular datacenter, and return all metrics with this tag. Nomad
 supports labels for namespaces as well.
 
-## Key Metrics
+## Key metrics
 
 The metrics in the table below are the most important metrics for monitoring
 the overall health of a Nomad cluster.
@@ -121,7 +122,7 @@ signals.
 | `nomad.raft.replication.appendEntries`       | Raft transaction commit time                                                                                                                                                                                      | ms / Raft Log Append           | Timer   |
 | `nomad.license.expiration_time_epoch`        | Time as epoch (seconds since Jan 1 1970) at which license will expire                                                                                                                                             | Seconds                        | Gauge   |
 
-## Client Metrics
+## Client metrics
 
 The Nomad client emits metrics related to the resource usage of the allocations
 and tasks running on it and the node itself. Operators have to explicitly turn
@@ -149,7 +150,7 @@ parameterized or periodic job respectively. For example, a dispatch job with the
 | parent_id   | `myjob`                        |
 | dispatch_id | `1312323423423`                |
 
-## Host Metrics
+## Host metrics
 
 Nomad will emit [tagged metrics][tagged-metrics], in the below format:
 
@@ -188,7 +189,7 @@ Nomad will emit [tagged metrics][tagged-metrics], in the below format:
 | `nomad.client.unallocated.memory`         | Total amount of memory free for the scheduler to allocate to tasks                   | Megabytes  | Gauge   | datacenter, host, node_class, node_id, node_pool, node_scheduling_eligibility, node_status       |
 | `nomad.client.uptime`                     | Uptime of the host running the Nomad client                                          | Seconds    | Gauge   | datacenter, host, node_class, node_id, node_pool, node_scheduling_eligibility, node_status       |
 
-### Client Hook Metrics
+### Client hook metrics
 
 Nomad will emit metrics allowing you to monitor and alert on allocation and task hook performance.
 If you do not need these, they can be disabled via the [`disable_allocation_hook_metrics`][]
@@ -203,7 +204,7 @@ configuration parameter.
 | `nomad.client.task_hook.prestart.success` | Number of hook executions that completed successfully | Integer      | Counter | datacenter, host, node_class, node_id, node_pool, hook_name |
 | `nomad.client.task_hook.prestart.elapsed` | The time it took the hook to run                      | Milliseconds | Timer   | datacenter, host, node_class, node_id, node_pool, hook_name |
 
-## Allocation Metrics
+## Allocation metrics
 
 The following metrics are emitted for each allocation if allocation metrics
 are enabled. Note that allocation metrics available may be dependent on factors
@@ -234,7 +235,7 @@ such as the task driver and control group (cgroup) version in use.
 | `nomad.client.allocs.restart`                 | Number of task restarts                                           | Integer     | Counter | alloc_id, host, job, namespace, task, task_group |
 | `nomad.client.allocs.running`                 | Number of running allocations                                     | Integer     | Counter | alloc_id, host, job, namespace, task, task_group |
 
-## Job Summary Metrics
+## Job summary metrics
 
 Job summary metrics are emitted by the Nomad leader server.
 
@@ -248,7 +249,7 @@ Job summary metrics are emitted by the Nomad leader server.
 | `nomad.nomad.job_summary.running`  | Number of running allocations for a job  | Integer | Gauge | host, job, namespace, task_group |
 | `nomad.nomad.job_summary.starting` | Number of starting allocations for a job | Integer | Gauge | host, job, namespace, task_group |
 
-## Job Status Metrics
+## Job status metrics
 
 Job status metrics are emitted by the Nomad leader server.
 
@@ -258,7 +259,7 @@ Job status metrics are emitted by the Nomad leader server.
 | `nomad.nomad.job_status.pending` | Number of pending jobs | Integer | Gauge | host   |
 | `nomad.nomad.job_status.running` | Number of running jobs | Integer | Gauge | host   |
 
-## Server Metrics
+## Server metrics
 
 The following table includes metrics for overall cluster health in addition to
 those listed in [Key Metrics](#key-metrics) above.
@@ -501,7 +502,7 @@ those listed in [Key Metrics](#key-metrics) above.
 | `nomad.scheduler.allocs.rescheduled.wait_until`         | Time that a rescheduled allocation will be delayed                                                                                                     | Float                    | Gauge   | alloc_id, job, namespace, task_group, follow_up_eval_id |
 | `nomad.state.snapshotIndex`                             | Current snapshot index                                                                                                                                 | Integer                  | Gauge   | host                                                    |
 
-## Raft BoltDB Metrics
+## Raft BoltDB metrics
 
 Raft database metrics are emitted by the `raft-boltdb` library.
 
@@ -526,7 +527,7 @@ Raft database metrics are emitted by the `raft-boltdb` library.
 | `nomad.raft.boltdb.txstats.write`         | Count of total write operations           | Integer     | Counter |
 | `nomad.raft.boltdb.txstats.writeTime`     | Sample of write operation times           | Nanoseconds | Summary |
 
-## Agent Metrics
+## Agent metrics
 
 Agent metrics are emitted by all Nomad agents running in either client or server mode.
 

@@ -1,12 +1,11 @@
 ---
 layout: docs
-page_title: Monitoring Nomad
+page_title: Monitor Nomad
 description: |-
-  Overview of runtime metrics available in Nomad along with monitoring and
-  alerting.
+  Learn how to monitor the health and performance of Nomad clusters. Export data to Prometheus or DataDog. Review metrics for the Raft consensus protocol, scheduling, performance, capacity, task resource consumption, job and task status, runtime, and federated deployments.
 ---
 
-# Monitoring Nomad
+# Monitor Nomad
 
 The Nomad client and server agents collect a wide range of runtime metrics.
 These metrics are useful for monitoring the health and performance of Nomad
@@ -70,16 +69,14 @@ patterns.
   system as appropriate. In many cases, it may be ok if a given batch job fails
   occasionally, as long as it goes back to passing.
 
-# Key Performance Indicators
+## Key performance indicators
 
 Nomad servers' memory, CPU, disk, and network usage all scales linearly with
 cluster size and scheduling throughput. The most important aspect of ensuring
 Nomad operates normally is monitoring these system resources to ensure the
 servers are not encountering resource constraints.
 
-The sections below cover a number of other important metrics.
-
-## Consensus Protocol (Raft)
+## Raft consensus protocol
 
 Nomad uses the Raft consensus protocol for leader election and state
 replication. Spurious leader elections can be caused by networking
@@ -261,18 +258,18 @@ a per client basis.
 - **nomad.client.allocated.memory**
 - **nomad.client.unallocated.memory**
 
-## Task Resource Consumption
+## Task resource consumption
 
 The metrics listed [here][allocation-metrics] can be used to track resource
 consumption on a per task basis. For user facing services, it is common to alert
 when the CPU is at or above the reserved resources for the task.
 
-## Job and Task Status
+## Job and task status
 
 See [Job Summary Metrics] for monitoring the health and status of workloads
 running on Nomad.
 
-## Runtime Metrics
+## Runtime metrics
 
 Runtime metrics apply to all clients and servers. The following metrics are
 general indicators of load and memory pressure.
@@ -284,7 +281,7 @@ general indicators of load and memory pressure.
 It is recommended to alert on upticks in any of the above, server memory usage
 in particular.
 
-## Federated Deployments (Serf)
+## Serf federated deployments
 
 Nomad uses the membership and failure detection capabilities of the Serf library
 to maintain a single, global gossip pool for all servers in a federated