[SaaS] Increase Health Check Timeout to Prevent Cascade Failures

### Problem
During the November 5th, 2025 outage (12:01PM - 12:06PM IST), all tasks were marked unhealthy because `/health` endpoint started timing out. The current 5-second timeout is too aggressive and caused a cascade failure that made a 5-minute spike into a complete outage.

### Current Configuration
```
Timeout: 5 seconds
Interval: 15 seconds
Unhealthy threshold: 2 consecutive failures
Healthy threshold: 2 consecutive successes
```

### Issue
When the API experienced a request spike, `/health` responses exceeded 5 seconds. Health checks failed → tasks marked unhealthy → load balancer removed tasks → remaining tasks overloaded → more failures. Cascade effect.

### Proposed Change
**Option 1:** Increase timeout to **[10? 15?]** seconds to tolerate brief slowdowns without marking tasks unhealthy.

**Option 2:** Rethink health checks entirely:
- Use passive health checks based on actual request success rates
- Implement graceful degradation instead of binary healthy/unhealthy
- Add circuit breaker logic to prevent cascade failures

**Rationale:** 
- Better to serve slow requests than mark everything dead
- 5s is too tight for real-world load spikes
- Current approach creates cascading failures instead of preventing them

### Questions
- [ ] What's acceptable `/health` response time under load?
- [ ] Should we separate health check endpoint from main API?
- [ ] Should we adjust failure/success thresholds too?
- [ ] Can we use passive health monitoring instead?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SaaS] Increase Health Check Timeout to Prevent Cascade Failures #6251

Problem

Current Configuration

Issue

Proposed Change

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SaaS] Increase Health Check Timeout to Prevent Cascade Failures #6251

Description

Problem

Current Configuration

Issue

Proposed Change

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions