Skip to content

Add perf metrics collection, TCO estimator, and JMeter load-test wrapper#142

Open
sidd190 wants to merge 3 commits into
openMF:devfrom
sidd190:perf-tco-tools
Open

Add perf metrics collection, TCO estimator, and JMeter load-test wrapper#142
sidd190 wants to merge 3 commits into
openMF:devfrom
sidd190:perf-tco-tools

Conversation

@sidd190

@sidd190 sidd190 commented May 12, 2026

Copy link
Copy Markdown

Summary

Adds a self-contained performance/resource metrics and total cost of ownership (TCO) estimation toolkit for Mifos Gazelle, aligned with the current codebase and deployment model.

Addresses the spirit of GAZ-5: give operators and contributors a repeatable way to snapshot what the stack consumes on a real cluster and produce directional cloud cost estimates from those snapshots.

Evidence Artifacts from the cloud VM that was run on GCP, and the metrics generated!
metrics-after.json
metrics-before.json
tco-result.json
summary.txt
live-metrics.json

Screenshot from 2026-05-12 22-41-16 Screenshot from 2026-05-12 22-40-45 Screenshot from 2026-05-12 22-29-03 Screenshot from 2026-05-12 22-28-59 ---

What was added

Path Purpose
src/utils/perf/collect-metrics.sh Collects per-pod CPU/memory via kubectl top pods for Gazelle namespaces (infra, mifosx, paymenthub, vnext). Optional --storage flag sums requested PVC capacity per namespace. Writes a single JSON report. Supports --mock for local testing without a cluster.
src/utils/perf/tco-estimate.py Reads the metrics JSON and estimates monthly/annual cost using an embedded instance catalog (with optional --pricing-file), configurable --egress-gib, --topology (single-node | ha-3node), and --headroom. Supports --json-out and --all-providers comparison.
src/utils/perf/run-load-test.sh Headless wrapper around performance-testing/paymentHubEE.jmx. Runs JMeter with parameterized threads/duration/host, and invokes collect-metrics.sh --storage before/after when snapshots are enabled. Supports --mock dry-run.
docs/PERF-TCO.md Usage guide, prerequisites (including JDK for JMeter), data provenance, instructions for exporting evidence from a test VM, and interpretation notes.

Implementation notes

  • gawk / Ubuntu compatibility: collect-metrics.sh avoids using namespace as an awk variable name (reserved in gawk), ensuring kubectl top output parses correctly on Ubuntu 22.04+.
  • Load-test snapshots call collect-metrics.sh --storage so TCO inputs can use measured PVC requests when the cluster is reachable.

Quick start

See docs/PERF-TCO.md for full details. Typical flow on a machine with kubectl pointed at a Gazelle cluster:

# Collect metrics
bash src/utils/perf/collect-metrics.sh --storage --out /tmp/gazelle-metrics.json

# Estimate TCO
python3 src/utils/perf/tco-estimate.py \
  --metrics /tmp/gazelle-metrics.json \
  --topology ha-3node \
  --egress-gib 25

# Run load test with before/after snapshots
bash src/utils/perf/run-load-test.sh --threads 10 --duration 120 --out /tmp/lt

# Estimate TCO from post-load snapshot
python3 src/utils/perf/tco-estimate.py \
  --metrics /tmp/lt/metrics-after.json \
  --topology ha-3node \
  --egress-gib 25

Measured inputs vs. modeled costs

Note: Dollar amounts are not read from a cloud bill. They are a model built from cluster snapshots plus assumptions — suitable for planning and comparison, not procurement or finance sign-off without refreshed pricing.

Grounded in real cluster data (when run live):

  • CPU and memory totals and per-namespace rollups from kubectl top (point-in-time usage, not guaranteed limits)
  • With --storage: per-namespace storage_gib from summed PVC requested sizes (not bytes used on disk)

Modeled or user-supplied:

  • Instance type and hourly rate from the built-in catalog or --pricing-file (indicative on-demand Linux prices; pricing_as_of / source are surfaced in output)
  • Compute: hourly_rate × 730 h/month × topology multiplier (ha-3node uses a 3× planning multiplier)
  • Storage: requested GiB × per-GiB/month rate × topology multiplier
  • Network: --egress-gib × per-GiB egress rate (defaults are conservative placeholders)
  • Per-DPG cost lines: heuristic allocation by each component's share of measured memory across namespaces

Caveats and known limitations

  • kubectl top reflects current usage; short spikes may not be captured, and limits/requests differ from usage
  • PVC "storage" is requested capacity from the API, not actual disk consumption
  • TCO omits real-world line items: support, backups, cross-AZ traffic, load balancers, logging sinks, discounts, committed use, etc.
  • paymentHubEE.jmx is unchanged in this PR. On a real Gazelle deployment, requests may fail at the HTTP layer (auth, paths, tenant config) until the plan is aligned with current PHEE ingress and APIs. The wrapper, snapshots, and JTL/HTML reporting still validate the pipeline end-to-end.

Testing

  • collect-metrics.sh --mock
  • Live run on a GCP Ubuntu VM (e2-standard-4) with full Gazelle deploy; JSON output includes mode: "live", namespaces, and storage_gib when --storage is used
  • tco-estimate.py on real metrics JSON; --all-providers and --json-out exercised
  • run-load-test.sh end-to-end with JMeter + JDK; JMeter samples executed; HTTP failures documented as JMX/config gap, not script failure
  • run-load-test.sh --mock dry-run without JMeter

Follow-up (separate PR)

JMeter plan maintenance: Update or supplement performance-testing/paymentHubEE.jmx (and/or document required JMeter properties) so a default run against a standard Gazelle install achieves a meaningful success rate without manual credential configuration. Requires input from PHEE maintainers on supported public test endpoints and safe demo credentials.


Related

  • Jira: GAZ-5 — Perf and TCO facilities for Mifos Gazelle

Reviewers: Please treat cost figures as illustrative unless --pricing-file and egress inputs are refreshed for your target environment.

@sidd190 sidd190 requested a review from a team May 12, 2026 18:01
@tdaly-mifos

Copy link
Copy Markdown
Contributor

👋 Hi @sidd190 — thank you for your pull request.

This PR is currently blocked because we do not have a Contributor License Agreement (CLA) on file for your GitHub account.

To get unblocked:

  1. Complete the form at https://mifos.org/about-us/financial-legal/mifos-contributor-agreement
  2. Complete the CLA signing process
  3. Once verified you will be added to the approved contributors list and this PR check will be cleared

@tdaly-mifos tdaly-mifos added the cla-required CLA signature required before this PR can be merged label May 12, 2026
@sidd190 sidd190 closed this May 13, 2026
@sidd190 sidd190 reopened this May 13, 2026
@tdaly-mifos tdaly-mifos removed the cla-required CLA signature required before this PR can be merged label May 13, 2026
@tdaly61

tdaly61 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

hi @sidd190 , this PR is intriguing but can you please tell me about how you tested and connected to the JVM before we go any further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants