Skip to content

[ca-banking-demo-backend] High P95 Response Time — Chaos-Injected Thread.sleep in CreditCardService | Sev3 | 2026-03-23 #65

@yortch

Description

@yortch

Summary

Azure Monitor alert backend-high-response-time (Sev3) fired at 2026-03-23T14:32:01 UTC for Container App ca-banking-demo-backend in resource group rg-banking-demo. Backend P95 response time is elevated due to a chaos-engineering-injected Thread.sleep() in the primary card catalog endpoint.

Impact: All requests to GET /api/cards experience random delays of 0–10 seconds. Frontend appears to hang or time out. User-facing credit card listing is severely degraded.

Current Status: Active — chaos injection still present in deployed code.

Timeline (UTC)

Time Event
2026-03-22 ~17:00 Chaos scenario backend-slow-response introduced (commit 90ea585) with Thread.sleep(Math.random() * 30000) — up to 30s delay
2026-03-22 ~17:15 Delay reduced to 10s via automated instrumentation commit (ef613a9) — Thread.sleep(Math.random() * 10000)
2026-03-22 ~17:30 Documentation updated to reflect 9s max delay (commit 5eac0e7)
2026-03-23 14:32 Azure Monitor alert backend-high-response-time fires (Sev3)
2026-03-23 14:35 SRE Agent begins automated investigation

Evidence

Root Cause: Chaos-Injected Thread.sleep

File: backend/src/main/java/com/threeriversbank/service/CreditCardService.java

Line 36 (inside getAllCreditCards() method):

try { Thread.sleep((long)(Math.random() * 10000)); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }

This injects a random delay of 0–10,000 milliseconds on every invocation of the card listing endpoint. This is part of chaos engineering scenario #13 (backend-slow-response) documented in the repository.

Commit History

Commit Description Author
90ea585 feat(chaos): add backend-slow-response scenario — initial injection with 30s max delay Jorge Balderas
ef613a9 perf: add timing instrumentation to card catalog fetch — reduced to 10s max delay github-actions[bot]
eb482cd fix(chaos): reduce backend-slow-response delay to 10s Jorge Balderas
5eac0e7 fix(chaos): reduce backend-slow-response delay to 9s — documentation update Jorge Balderas

Affected Endpoint

  • Route: GET /api/cardsCreditCardController.getAllCards()CreditCardService.getAllCreditCards()
  • Behavior: Every request incurs Thread.sleep((long)(Math.random() * 10000)) before querying H2 database
  • Downstream impact: Frontend HomePage.jsx and CardComparisonPage.jsx depend on this endpoint for card data

Correlated Source Code

File Method Line Issue
CreditCardService.java getAllCreditCards() L36 Thread.sleep chaos injection — ROOT CAUSE
CreditCardController.java getAllCards() - Calls the affected service method
HomePage.jsx - - Fetches /api/cards — will experience delays/timeouts
CardComparisonPage.jsx - - Fetches /api/cards — will experience delays/timeouts

Remediation

Fix: Remove the Thread.sleep line from CreditCardService.java line 36.

The getAllCreditCards() method should be:

@Transactional(readOnly = true)
public List<CreditCardDto> getAllCreditCards() {
    log.info("Fetching all credit cards from H2 database");
    return creditCardRepository.findAll().stream()
            .map(this::convertToDto)
            .collect(Collectors.toList());
}

Steps

  1. Remove Thread.sleep line from CreditCardService.java:36
  2. Build and test: cd backend && mvn clean test
  3. Deploy updated image to ca-banking-demo-backend
  4. Verify P95 response time returns to baseline (<200ms)
  5. Close Azure Monitor alert

Related Items

  • Alert: backend-high-response-time (Azure Monitor, Sev3)
  • Alert ID: /subscriptions/529eddcc-17c4-4834-842d-73670845229b/resourcegroups/rg-banking-demo/providers/microsoft.app/containerapps/ca-banking-demo-backend/providers/Microsoft.AlertsManagement/alerts/a6091ee0-c198-473b-8fbf-3f2519ecf000
  • Resource: ca-banking-demo-backend (Azure Container App)
  • Chaos Scenario: Modularize Terraform infrastructure into reusable components #13 backend-slow-response (documented in .github/workflows/chaos-engineering.md)

Labels

copilot:fix-this, sre-agent-detected


This issue was created by three-rivers-bank-sre--35969ff3
Tracked by the SRE agent here

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions