Summary
Azure Monitor alert alert-latency-sre-three-rivers fired at 2026-03-25T17:33:54Z (Sev3) on ca-banking-demo-backend indicating elevated average response time. Root cause is a recurring chaos injection — Thread.sleep((long)(Math.random() * 9000)) at line 36 of CreditCardService.java — adding 0–9 seconds of random latency to every GET /api/cards request.
This is the third occurrence of this pattern (previously tracked in #71 and #73).
Impact
- Affected endpoint:
GET /api/cards — the primary card catalog API
- Latency range: 0–9,000ms random delay per request (avg ~4.5s)
- User experience: Card listing and comparison pages load slowly or appear to hang
- Severity: Sev3 (service degraded but functional; all requests return 200)
Timeline (UTC)
| Time |
Event |
| 2026-03-25 12:59:19 |
Latest revision ca-banking-demo-backend--azd-1774443552 deployed |
| 2026-03-25 17:01:01 |
GET /api/cards request logged (with Thread.sleep delay) |
| 2026-03-25 17:30:46 |
Two more GET /api/cards requests logged |
| 2026-03-25 17:33:54 |
Alert fired: avg response time elevated |
| 2026-03-25 17:39:00 |
SRE Agent investigation started |
Evidence
Metrics (17:00–17:40 UTC)
| Metric |
Values |
Assessment |
| Requests |
1 (17:00), 2 (17:30), 0 elsewhere |
Sparse — each slow request dominates avg |
| CPU |
~1M nanocores avg (0.2% of 0.5 vCPU) |
Normal — no resource starvation |
| Memory |
~263 MB avg (24.5% of 1 GiB) |
Normal — no memory pressure |
| Restarts |
0 |
No OOM or crash events |
| Replicas |
2 active |
Scaling healthy |
Console Logs
Two GET /api/cards invocations observed, zero errors:
2026-03-25T17:01:01.875Z INFO CreditCardService : Fetching all credit cards from H2 database
2026-03-25T17:30:46.587Z INFO CreditCardService : Fetching all credit cards from H2 database
No exceptions, no 5xx responses — consistent with artificial delay (not errors).
Root Cause
File: backend/src/main/java/com/threeriversbank/service/CreditCardService.java#L36
@Transactional(readOnly = true)
public List<CreditCardDto> getAllCreditCards() {
log.info("Fetching all credit cards from H2 database");
try { Thread.sleep((long)(Math.random() * 9000)); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
return creditCardRepository.findAll().stream()
.map(this::convertToDto)
.collect(Collectors.toList());
}
Thread.sleep((long)(Math.random() * 9000)) injects a random 0–9 second delay on every call to getAllCreditCards(). This is a chaos engineering artifact that was not reverted after testing.
Why CPU/Memory look normal
The Thread.sleep call blocks the thread without consuming CPU cycles. The JVM simply idles during the sleep, so CPU and memory metrics remain flat — ruling out resource starvation.
Why sparse traffic amplifies the alert
With only 1–2 requests per 5-minute window, a single 8-second sleep dominates the average response time, easily exceeding the alert threshold.
Remediation
Immediate Fix
Remove the Thread.sleep line from CreditCardService.java:
@Transactional(readOnly = true)
public List<CreditCardDto> getAllCreditCards() {
log.info("Fetching all credit cards from H2 database");
return creditCardRepository.findAll().stream()
.map(this::convertToDto)
.collect(Collectors.toList());
}
Preventive Measures
- Add a CI/CD check: Lint or grep for
Thread.sleep in non-test source files to prevent chaos code from reaching production
- Feature flag chaos injections: Use a runtime-configurable flag (e.g., env var
CHAOS_ENABLED=false) instead of hardcoded delays
- Pre-merge code review gate: Flag any
Thread.sleep in service layer code during PR review
Action Items
Related Issues
This issue was created by sre-sre-three-rivers-bswqe--b2b14894
Tracked by the SRE agent here
Summary
Azure Monitor alert
alert-latency-sre-three-riversfired at 2026-03-25T17:33:54Z (Sev3) onca-banking-demo-backendindicating elevated average response time. Root cause is a recurring chaos injection —Thread.sleep((long)(Math.random() * 9000))at line 36 ofCreditCardService.java— adding 0–9 seconds of random latency to everyGET /api/cardsrequest.This is the third occurrence of this pattern (previously tracked in #71 and #73).
Impact
GET /api/cards— the primary card catalog APITimeline (UTC)
ca-banking-demo-backend--azd-1774443552deployedGET /api/cardsrequest logged (with Thread.sleep delay)GET /api/cardsrequests loggedEvidence
Metrics (17:00–17:40 UTC)
Console Logs
Two
GET /api/cardsinvocations observed, zero errors:No exceptions, no 5xx responses — consistent with artificial delay (not errors).
Root Cause
File:
backend/src/main/java/com/threeriversbank/service/CreditCardService.java#L36Thread.sleep((long)(Math.random() * 9000))injects a random 0–9 second delay on every call togetAllCreditCards(). This is a chaos engineering artifact that was not reverted after testing.Why CPU/Memory look normal
The
Thread.sleepcall blocks the thread without consuming CPU cycles. The JVM simply idles during the sleep, so CPU and memory metrics remain flat — ruling out resource starvation.Why sparse traffic amplifies the alert
With only 1–2 requests per 5-minute window, a single 8-second sleep dominates the average response time, easily exceeding the alert threshold.
Remediation
Immediate Fix
Remove the
Thread.sleepline fromCreditCardService.java:Preventive Measures
Thread.sleepin non-test source files to prevent chaos code from reaching productionCHAOS_ENABLED=false) instead of hardcoded delaysThread.sleepin service layer code during PR reviewAction Items
Thread.sleepfromCreditCardService.java:36Thread.sleepin production code pathsRelated Issues
This issue was created by sre-sre-three-rivers-bswqe--b2b14894
Tracked by the SRE agent here