Skip to content

πŸ”΄ Elevated Backend Latency β€” Thread.sleep Chaos Injection in CreditCardService (Sev3)Β #73

@yortch

Description

@yortch

Summary

Azure Monitor alert alert-latency-sre-three-rivers fired at 2026-03-25T16:34:48Z for ca-banking-demo-backend (Sev3). Root cause is a Thread.sleep((long)(Math.random() * 9000)) call on line 36 of CreditCardService.java injecting 0–9 seconds of random delay on every GET /api/cards request. This is a recurring chaos injection β€” the same pattern was found in a previous incident.

Impact

Dimension Detail
Affected endpoint GET /api/cards (primary card listing API)
Impact scope All users browsing credit cards; frontend card comparison page
Avg response time 1048–7644ms (expected: <200ms)
P99 estimated Up to 9000ms (max Thread.sleep value)
Duration Since deployment at 12:59:19 UTC (3.5+ hours)

Timeline (UTC)

Time Event
12:59:19 New revision ca-banking-demo-backend--azd-1774443552 deployed (image azd-deploy-1774443429)
13:00:02 Replica 1 cold-started in 35.1s
13:00:08 Replica 2 cold-started in 39.3s
13:05 Load test burst β€” 43 requests served
15:00 Avg response time 1148ms (4 requests)
15:30 Avg response time 1416ms (4 requests)
16:00 Avg response time 1048ms (4 requests)
16:30 Avg response time 7644ms (1 request β€” hit near-max sleep)
16:34:48 Alert fired: alert-latency-sre-three-rivers (Sev3)

Evidence

Response Time Metrics

Time(UTC)    Requests   Avg ResponseTime(ms)
15:00        4          1148
15:30        4          1416
16:00        4          1048
16:30        1          7644   ← Triggered alert

Resource Utilization (rules out resource starvation)

  • CPU: Steady ~0.2% of 0.5 core (avg ~1.1M nanocores)
  • Memory: Steady ~25% of 1Gi (avg ~263MB)
  • Replicas: 2 (healthy, no restarts)
  • No errors/exceptions in console logs

Console Log Activity

All requests logged as normal GET /api/cards with HTTP 200, no stack traces or errors.

Root Cause

File: backend/src/main/java/com/threeriversbank/service/CreditCardService.java#L36

// Line 36 β€” CHAOS INJECTION (must be removed)
try { Thread.sleep((long)(Math.random() * 9000)); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }

Call path:

HTTP GET /api/cards
  β†’ CreditCardController.getAllCards() [line 36]
    β†’ CreditCardService.getAllCreditCards() [line 34-39]
      β†’ Thread.sleep(0–9000ms) ← INJECTED DELAY
      β†’ creditCardRepository.findAll()
      β†’ stream().map(convertToDto)

This Thread.sleep injects a random delay of 0–9000ms before every database query in the primary card listing endpoint. It:

  • Directly causes elevated average response times
  • Has high variance (0ms to 9s) making intermittent β€” sometimes within threshold, sometimes far above
  • Ties up Tomcat NIO threads during sleep, reducing concurrency
  • Was present in the deployed image since revision azd-1774443552 (12:59:19 UTC)

This is the same chaos injection pattern found in a previous incident (Issue #71).

Remediation

Immediate Fix

Remove line 36 from CreditCardService.java:

  @Transactional(readOnly = true)
  public List<CreditCardDto> getAllCreditCards() {
      log.info("Fetching all credit cards from H2 database");
-     try { Thread.sleep((long)(Math.random() * 9000)); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
      return creditCardRepository.findAll().stream()
              .map(this::convertToDto)
              .collect(Collectors.toList());
  }

Deployment

After merging the fix, redeploy the backend container app:

azd deploy backend

Action Items

  • P0: Remove Thread.sleep chaos injection from CreditCardService.java:36
  • P1: Add CI check / code review gate to prevent Thread.sleep in production code paths
  • P1: Add automated latency regression test in CI pipeline (assert GET /api/cards < 500ms)
  • P2: Implement chaos engineering via feature flags (e.g., Spring profiles) instead of inline code changes
  • P2: Close Azure Monitor alert after fix verified

Configuration Snapshot

Property Value
Resource ca-banking-demo-backend
Resource Group rg-banking-demo
Subscription 529eddcc-17c4-4834-842d-73670845229b
Image crbankingdemooqaqx.azurecr.io/three-rivers-bank/backend-banking-demo:azd-deploy-1774443429
Revision ca-banking-demo-backend--azd-1774443552
CPU/Memory 0.5 vCPU / 1Gi
Replicas 2 (min 2, max 3)
Target Port 8080
BIAN_API_BASE_URL https://virtserver.swaggerhub.com/B154/BIAN/CreditCard/13.0.0 βœ…

This issue was created by sre-sre-three-rivers-bswqe--b2b14894
Tracked by the SRE agent here

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions