Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling restart Admin API #1026

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions modules/get-started/pages/whats-new.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ This topic includes new content added in version {page-component-version} Beta.
* xref:redpanda-cloud:get-started:whats-new-cloud.adoc[]
* xref:redpanda-cloud:get-started:cloud-overview.adoc#redpanda-cloud-vs-self-managed-feature-compatibility[Redpanda Cloud vs Self-Managed feature compatibility]
== New health probes for broker restarts and upgrades

The Redpanda Admin API now includes new health probes to help you ensure safe broker restarts and upgrades. The xref:api:ROOT:admin-api.adoc#get-/v1/broker/pre_restart_probe[`pre_restart_probe`] endpoint identifies potential risks if a broker is restarted, and xref:api:ROOT:admin-api.adoc#get-/v1/broker/post_restart_probe[`post_restart_probe`] indicates how much of its workloads a broker has reclaimed after the restart. See also:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas please check these links. Is the API just not updated yet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


* xref:manage:cluster-maintenance/rolling-restart.adoc[]
* xref:upgrade:rolling-upgrade.adoc[]

== Redpanda Console v3.0.0 (beta)

The Redpanda Console v3.0.0 beta release includes the following updates:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ rpk cluster health
.Example output:
[%collapsible]
====
[.no-copy]
[,bash,role=no-copy]
----
CLUSTER HEALTH OVERVIEW
=======================
Expand All @@ -19,12 +19,40 @@ Controller ID: 0
All nodes: [0 1 2] <2>
Nodes down: [] <3>
Leaderless partitions: [] <3>
Under-replicated partitions: [] <3>
Under-replicated partitions: [1] <3>
----
<1> The cluster is either healthy (`true`) or unhealthy (`false`).
<2> The node IDs of all brokers in the cluster.
<3> If the cluster is unhealthy, these fields will contain data.
====
====

. Optional: You can use the Admin API (default port: 9644) to perform additional checks for potential risks with restarting a specific broker.
+
[,bash]
----
curl -X GET "http://<broker-address>:<admin-api-port>/v1/broker/pre_restart_probe" | jq .
----
+
.Example output:
[,json,role=no-copy]
----
// Returns tuples of partitions (in the format {namespace}/{topic_name}/{partition_id}) affected by the broker restart.

{
"risks": {
"rf1_offline": [
"kafka/topic_a/0"
],
"full_acks_produce_unavailable": [],
"unavailable": [],
"acks1_data_loss": []
}
}
----
+
In this example, the restart probe indicates that there is an under-replicated partition `kafka/topic_a/0` (with a replication factor of 1) at risk of going offline if the broker is restarted.
+
See the xref:api:ROOT:admin-api.adoc#get-/v1/broker/pre_restart_probe[Admin API reference] for more details on the restart probe endpoint.

ifdef::rolling-upgrade[. Select a broker that has not been upgraded yet and place it into maintenance mode:]
ifdef::rolling-restart[. Select a broker and place it into maintenance mode:]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,19 @@ To view additional information about your brokers, run:

```bash
rpk redpanda admin brokers list
```
```

You can also use the xref:api:ROOT:admin-api.adoc#get-/v1/broker/post_restart_probe[Admin API] to check how much each broker has progressed in recovering its workloads:

```bash
curl -X GET "http://<broker-address>:<admin-api-port>/v1/broker/post_restart_probe"
```

.Example output:
[,json,role=no-copy]
----
// Returns the load already reclaimed by broker, as a percentage of in-sync replicas
{
"load_reclaimed_pc": 66

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice @bashtanov ! Seems this will help a lot in building a smoother automated restart/upgrade automation @chrisseto @mmaslankaprv

}
----