Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use rpk to analyze partitions and size clusters #1034

Merged
merged 7 commits into from
Mar 28, 2025
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,89 @@ The https://github.com/redpanda-data/openmessaging-benchmark[Open Messaging Benc

See also: https://github.com/redpanda-data/openmessaging-benchmark/blob/main/driver-redpanda/README.md[Redpanda Benchmarks^]

== Assess throughput

This section describes how to use the xref:reference:rpk/rpk-topic-analyze.adoc[`rpk topic analyze`] command to check how much work your Redpanda cluster is handling. It shows the number of messages the cluster is processing and the size of the data groups (batches). This information helps you decide if you need to add more servers or make changes to your setup.

This command shows you the throughput of your Redpanda cluster:

[source,bash]
----
rpk topic analyze --regex '*' --print-all --time-range -1m:end
----

The arguments are:

* `--regex '*'`: Analyzes all topics.
* `--print-all`: Prints all the metrics.
* `--time-range -1m:end`: Analyzes the last minute of data.

Example output:

[,bash,role="no-copy no-wrap"]
----
SUMMARY
=======
TOPICS 6
PARTITIONS 17
TOTAL THROUGHPUT (BYTES/S) 1361.9166666666667
TOTAL BATCH RATE (BATCHES/S) 2.9833333333333334
AVERAGE BATCH SIZE (BYTES) 456.50837988826817

TOPIC SUMMARY
=============
TOPIC PARTITIONS BYTES-PER-SECOND BATCHES-PER-SECOND AVERAGE-BYTES-PER-BATCH
_redpanda.audit_log 12 61 0.1 610
_redpanda.transform_logs 1 890.2666666666667 0.7833333333333333 1136.5106382978724
_schemas 1 0 0 0
edu-filtered-domains 1 14.283333333333333 0.1 142.83333333333334
logins 1 144.61666666666667 1 144.61666666666667
transactions 1 251.75 1 251.75

PARTITION BATCH RATE (BATCHES/S)
================================
TOPIC P25 P50 P75 P99
_redpanda.audit_log 0.016666666666666666 0.016666666666666666 0.03333333333333333 0.03333333333333333
_redpanda.transform_logs 0.7833333333333333 0.7833333333333333 0.7833333333333333 0.7833333333333333
_schemas 0 0 0 0
edu-filtered-domains 0.1 0.1 0.1 0.1
logins 1 1 1 1
transactions 1 1 1 1

PARTITION BATCH SIZE (BYTES)
============================
TOPIC P25 P50 P75 P99
_redpanda.audit_log 608 610 610 611
_redpanda.transform_logs 895 895 895 895
_schemas 0 0 0 0
edu-filtered-domains 141 141 141 141
logins 144 144 144 144
transactions 255 255 255 255
----

* **Total throughput:**
Indicates the total amount of data processed by the cluster every second.

* **Total batch rate:**
Shows the number of message batches processed per second. A higher rate suggests increased activity, which may require more CPU or I/O resources.

* **Average batch size:**
Reflects the average size of each message batch. Large or inconsistent batch sizes may indicate the need to adjust producer settings or verify storage capacity.

* **Topic and partition summaries:**
Provides details on resource usage by individual topics. For example, if a single topic (such as `_redpanda.transform_logs` in the example output) is responsible for most throughput, it may need optimization or additional resources.

* **Percentiles (P25, P50, P75, P99):**
Offers insights into workload distribution across partitions. Consistent values suggest balanced workloads, while significant variations may highlight areas that need rebalancing or capacity adjustments.

=== Plan for capacity

Compare the current throughput and batch rate with your cluster's hardware limits, such as network bandwidth, disk IOPS, or CPU capacity. If usage is nearing these limits, consider scaling up (upgrading hardware) or scaling out (adding brokers). Monitor trends over time to anticipate when expansion is necessary.

=== Address bottlenecks

If specific topics or partitions consistently show higher loads, it may indicate uneven workload distribution. Redistribute partitions or adjust replication factors to balance the load more effectively.

include::shared:partial$suggested-reading.adoc[]

* https://redpanda.com/blog/sizing-redpanda-cluster-best-practices[Four sizing principles for Redpanda production clusters^]
Expand Down
Loading