Skip to content

Improve MirrorMaker example grafana dashboards and yaml#12777

Open
tinaselenge wants to merge 3 commits into
strimzi:mainfrom
tinaselenge:mirror-examples
Open

Improve MirrorMaker example grafana dashboards and yaml#12777
tinaselenge wants to merge 3 commits into
strimzi:mainfrom
tinaselenge:mirror-examples

Conversation

@tinaselenge
Copy link
Copy Markdown
Contributor

@tinaselenge tinaselenge commented May 29, 2026

Type of change

Select the type of your PR

  • Refactoring

Description

Current MirrorMaker Dashboard (not including Workers and JVM section below because they are not changed in this PR):

Screenshot 2026-05-29 at 10 14 59

Issues fixed are:

  • Some of the dashboard names were not clear such as total number of connectors/tasks
  • When monitoring mirroring, the most critical metrics are lags of the source connector. These should be more up front.
  • Several queries were not filtering the correct labels, showing values from internal topics. For example, for consumer lag, we only care about the source connector consumer, consuming from the source topics.
  • Consumer lag panel that shows list of consumers, should show a limited number of consumers sorted by the lag, not all the consumers. It's more useful to see, what is my top 10 or 20 consumers that are lagging the most.
  • Incoming/Outgoing bytes should be shown per worker rather than the whole cluster to show the load on the workers.
  • Replication-latency and checkpoint-latency panels should be shifted lower as they don't show accurate end to end latency.
  • Available buffer, commit time, outstanding message queue panel should be shifter lower, as they are more low level metrics that are useful for specific problems. Also available buffer query should be updated to show per client, rather than filtering on some hard coded topic name. The internal topics could be named to anything.
  • A few panel did not work due to examplar being set to true.

After the update:
(JMX dashboard)
Screenshot 2026-05-28 at 13 44 58

(Strimzi Metrics Reporter dashboard)
Screenshot 2026-05-29 at 09 29 33

Several queries needed to be updated for Strimzi Metrics Reporter dashboard as some metric names and labels are different. One remaining issue for the Strimzi Metrics Reporter is Consumer Lag panels, because queries are updated to filter on source cluster's consumer specifically (works for JMX). For some reason, only data that is being emitted is internal topic consumers. I will further investigate and raise an issue to fix in another PR.

This PR also updates the example yaml for KafkaMirrorMaker2 CR to demo some of the best practices with comments explaining the reasons.

These changes brings more opinionated way to deploy and monitor metrics for MirrorMaker.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented May 29, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>
Signed-off-by: Gantigmaa Selenge <tina.selenge@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.19%. Comparing base (fa1d995) to head (53ea2bc).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12777      +/-   ##
============================================
+ Coverage     75.02%   75.19%   +0.16%     
- Complexity     6386     6459      +73     
============================================
  Files           345      346       +1     
  Lines         24137    24329     +192     
  Branches       3091     3120      +29     
============================================
+ Hits          18108    18293     +185     
- Misses         4800     4802       +2     
- Partials       1229     1234       +5     

see 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant