Skip to content

Commit 2173d65

Browse files
technicalpicklesclaudeDumbris
authored
feat(health): Unified Health Status Implementation (#192)
* docs: add unified health status design for OAuth UX consistency * docs: add unified health status specification (012) * docs: add unified health status specification and implementation plan Add complete specification and implementation plan for consistent server health status across CLI, tray, web UI, and MCP tools. Artifacts: - spec.md: Feature specification with user stories and requirements - plan.md: Implementation plan with technical context and constitution check - research.md: Research findings and decisions - data-model.md: HealthStatus entity definition and state transitions - contracts/api.yaml: OpenAPI schema additions - quickstart.md: Implementation guide and verification checklist Related #191 * /speckit:analyze fixes * feat(health): add unified health status calculation for upstream servers Implement unified health status that provides consistent server health information across all interfaces (CLI, REST API, MCP tools). Core changes: - Add internal/health package with CalculateHealth() function - Add HealthStatus struct with level, admin_state, summary, detail, action - Integrate health calculation into runtime.GetAllServers() - Add health field to MCP handleListUpstreams() response - Update CLI upstream list to show health status and action hints - Add HealthStatus schema to OpenAPI spec - Add oauth_expiry_warning_hours config option Health levels: healthy, degraded, unhealthy Admin states: enabled, disabled, quarantined Actions: login, restart, enable, approve, view_logs The health calculator uses a priority-based algorithm: 1. Admin state (disabled/quarantined) short-circuits 2. Connection state (error/disconnected/connecting) 3. OAuth state (expired/error/expiring soon) 4. Healthy connected state Includes 20+ unit tests covering all health scenarios including FR-016 verification (token with refresh returns healthy). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * validate what is complete * feat(health): implement unified health status across all interfaces Complete implementation of the unified health status feature (012): ## Backend (already committed) - HealthStatus struct with level, admin_state, summary, detail, action - CalculateHealth() function in internal/health/calculator.go - Health field added to contracts.Server ## CLI - Updated upstream list to use health.level for status emoji - Added action hints column showing CLI commands for fixes - Distinct indicators for disabled/quarantined admin states ## Tray App - Updated getServerStatusDisplay() to use unified health status - Health-based status indicators (green/orange/red/paused/locked) - Added restart action menu items based on health.action ## Web UI - ServerCard.vue uses health.level for badge color - Added action buttons (Login, Restart, Enable, Approve, View Logs) - HealthStatus TypeScript interface in contracts.ts ## Dashboard - X servers need attention banner for degraded/unhealthy servers - Quick-fix action buttons in banner - Filters out disabled/quarantined from attention list ## Documentation - Updated CLAUDE.md with health status documentation - Added HealthStatus schema to oas/swagger.yaml - Created followup doc for verify-oas-coverage.sh issues All 44 tasks complete. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(frontend): add HealthStatus type to api.ts for TypeScript compilation The Server interface in api.ts was missing the health field and HealthStatus interface that were added to contracts.ts. This caused TypeScript compilation errors in ServerCard.vue and Dashboard.vue. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(health): improve unified health status consistency across interfaces - Refactor CLI auth status to use unified health status from backend (FR-006, FR-007) - CLI upstream list now uses health.CalculateHealth() for DRY principle (I-003) - Add tooltip in ServerCard.vue showing health.detail for context (M-004) - Add defensive null check in Dashboard.vue for backward compatibility (I-004) - Include exact token expiration time in Detail field (M-002) - Add debug logging for health status calculation (M-005) - Add E2E tests verifying health field structure in MCP responses (FR-017, FR-018) - Add unit tests ensuring Summary is never empty (FR-004, I-002) - Extract health status in management service ListServers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(health): use ordered slice for error pattern matching The formatErrorSummary function used a map for error pattern matching, but Go map iteration order is non-deterministic. This caused flaky test failures when error messages matched multiple patterns (e.g., 'dial tcp: no such host' matches both 'dial tcp' and 'no such host'). Changed to an ordered slice where more specific patterns (like 'no such host') are checked before generic ones (like 'dial tcp'). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(tests): E2E tests use isolated config and instance-scoped container cleanup Two issues were causing E2E tests to fail: 1. Tests loaded user's real ~/.mcpproxy/mcp_config.json instead of a clean test config, causing connection attempts to 15+ real upstream servers. Fixed by creating minimal config files in test temp directories. 2. Docker container cleanup affected ALL mcpproxy instances on the machine, not just the test instance. This caused 15+ second shutdown delays as the test server tried to clean up containers from other running instances. Fixed by filtering container operations by instance ID label. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: repair merge conflict in types.go and regenerate OpenAPI The GitHub UI merge resolution accidentally deleted the closing brace for the HealthStatus struct. This fix adds the missing `}` and regenerates the OpenAPI artifacts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: update upstream_cmd tests for unified health status format The table output format was changed to use unified health status instead of separate ENABLED/CONNECTED columns. Updated tests to: - Check for new headers: NAME, PROTOCOL, TOOLS, STATUS, ACTION - Verify health status emojis (✅, ⏸️, 🔒, ❌) based on health level and admin state instead of yes/no boolean values 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Dumbris <a.dumbris@gmail.com>
1 parent 0500460 commit 2173d65

34 files changed

Lines changed: 3354 additions & 243 deletions

CLAUDE.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,8 +159,98 @@ See [docs/configuration.md](docs/configuration.md) for complete reference.
159159

160160
**Authentication**: Use `X-API-Key` header or `?apikey=` query parameter.
161161

162+
**Real-time Updates**:
163+
- `GET /events` - Server-Sent Events (SSE) stream for live updates
164+
- Streams both status changes and runtime events (`servers.changed`, `config.reloaded`)
165+
- Used by web UI and tray for real-time synchronization
166+
167+
**API Authentication Examples**:
168+
```bash
169+
# Using X-API-Key header (recommended for curl)
170+
curl -H "X-API-Key: your-api-key" http://127.0.0.1:8080/api/v1/servers
171+
172+
# Using query parameter (for browser/SSE)
173+
curl "http://127.0.0.1:8080/api/v1/servers?apikey=your-api-key"
174+
175+
# SSE with API key
176+
curl "http://127.0.0.1:8080/events?apikey=your-api-key"
177+
178+
# Open Web UI with API key (tray app does this automatically)
179+
open "http://127.0.0.1:8080/ui/?apikey=your-api-key"
180+
```
181+
182+
**Security Notes**:
183+
- **MCP endpoints (`/mcp`, `/mcp/`)** remain **unprotected** for client compatibility
184+
- **REST API** requires authentication - API key is always enforced (auto-generated if not provided)
185+
- **Secure by default**: Empty or missing API keys trigger automatic generation and persistence to config
186+
162187
See [docs/api/rest-api.md](docs/api/rest-api.md) and `oas/swagger.yaml` for API reference.
163188

189+
### Unified Health Status
190+
191+
All server responses include a `health` field that provides consistent status information across all interfaces (CLI, web UI, tray, MCP tools):
192+
193+
```json
194+
{
195+
"health": {
196+
"level": "healthy|degraded|unhealthy",
197+
"admin_state": "enabled|disabled|quarantined",
198+
"summary": "Human-readable status summary",
199+
"detail": "Additional context about the status",
200+
"action": "login|restart|enable|approve|view_logs|"
201+
}
202+
}
203+
```
204+
205+
**Health Levels**:
206+
- `healthy`: Server is connected and functioning normally
207+
- `degraded`: Server has warnings (e.g., OAuth token expiring soon)
208+
- `unhealthy`: Server has errors or is not functioning
209+
210+
**Admin States**:
211+
- `enabled`: Normal operation
212+
- `disabled`: User disabled the server
213+
- `quarantined`: Server pending security approval
214+
215+
**Actions**: Suggested remediation action for the current state. Empty when no action is needed.
216+
217+
**Configuration**: Token expiry warning threshold can be configured:
218+
```json
219+
{
220+
"oauth_expiry_warning_hours": 24
221+
}
222+
```
223+
224+
## JavaScript Code Execution
225+
226+
The `code_execution` tool enables orchestrating multiple upstream MCP tools in a single request using sandboxed JavaScript (ES5.1+).
227+
228+
### Configuration
229+
230+
```json
231+
{
232+
"enable_code_execution": true,
233+
"code_execution_timeout_ms": 120000,
234+
"code_execution_max_tool_calls": 0,
235+
"code_execution_pool_size": 10
236+
}
237+
```
238+
239+
### CLI Usage
240+
241+
```bash
242+
mcpproxy code exec --code="({ result: input.value * 2 })" --input='{"value": 21}'
243+
mcpproxy code exec --code="call_tool('github', 'get_user', {username: input.user})" --input='{"user":"octocat"}'
244+
```
245+
246+
### Documentation
247+
248+
See `docs/code_execution/` for complete guides:
249+
- `overview.md` - Architecture and best practices
250+
- `examples.md` - 13 working code samples
251+
- `api-reference.md` - Complete schema documentation
252+
- `troubleshooting.md` - Common issues and solutions
253+
164254
## Security Model
165255

166256
- **Localhost-only by default**: Core server binds to `127.0.0.1:8080`

cmd/mcpproxy/auth_cmd.go

Lines changed: 45 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -228,10 +228,7 @@ func runAuthStatusClientMode(ctx context.Context, dataDir, serverName string, al
228228
name, _ := srv["name"].(string)
229229
oauth, _ := srv["oauth"].(map[string]interface{})
230230
authenticated, _ := srv["authenticated"].(bool)
231-
connected, _ := srv["connected"].(bool)
232231
lastError, _ := srv["last_error"].(string)
233-
enabled, _ := srv["enabled"].(bool)
234-
userLoggedOut, _ := srv["user_logged_out"].(bool)
235232

236233
// Check if this is an OAuth server by:
237234
// 1. Has oauth config OR
@@ -247,50 +244,56 @@ func runAuthStatusClientMode(ctx context.Context, dataDir, serverName string, al
247244

248245
hasOAuthServers = true
249246

250-
// Determine status emoji and text using oauth_status for accurate state
251-
oauthStatus, _ := srv["oauth_status"].(string)
252-
var status string
253-
254-
// Check priority states first
255-
if !enabled {
256-
// Server is disabled - no reconnection attempts
257-
status = "⏸️ Disabled"
258-
} else if userLoggedOut {
259-
// User explicitly logged out - no auto-reconnection
260-
status = "🚪 Logged Out (Login Required)"
261-
} else if connected {
262-
// Connected states
263-
if authenticated {
264-
status = "✅ Authenticated & Connected"
265-
} else {
266-
status = "⚠️ Connected (No OAuth Token)"
267-
}
268-
} else {
269-
// Disconnected states - use oauth_status for clarity
270-
switch oauthStatus {
271-
case "authenticated":
272-
// Token valid but not connected - likely reconnecting
273-
status = "⏳ Reconnecting (Token Valid)"
274-
case "expired":
275-
// Token expired - needs re-authentication
276-
status = "⚠️ Token Expired (Login Required)"
277-
case "error":
278-
status = "❌ Authentication Error"
247+
// Use unified health status from backend (FR-006, FR-007)
248+
var healthLevel, adminState, healthSummary, healthAction string
249+
if health, ok := srv["health"].(map[string]interface{}); ok && health != nil {
250+
healthLevel, _ = health["level"].(string)
251+
adminState, _ = health["admin_state"].(string)
252+
healthSummary, _ = health["summary"].(string)
253+
healthAction, _ = health["action"].(string)
254+
}
255+
256+
// Determine status emoji based on admin_state first, then health level
257+
var statusEmoji string
258+
switch adminState {
259+
case "disabled":
260+
statusEmoji = "⏸️"
261+
case "quarantined":
262+
statusEmoji = "🔒"
263+
default:
264+
// Use health level for enabled servers
265+
switch healthLevel {
266+
case "healthy":
267+
statusEmoji = "✅"
268+
case "degraded":
269+
statusEmoji = "⚠️"
270+
case "unhealthy":
271+
statusEmoji = "❌"
279272
default:
280-
// No token or oauth_status not set
281-
if lastError != "" {
282-
status = "❌ Authentication Failed"
283-
} else if authenticated {
284-
// Fallback: has token but no oauth_status
285-
status = "⏳ Reconnecting"
286-
} else {
287-
status = "⏳ Pending Authentication"
288-
}
273+
statusEmoji = "❓"
289274
}
290275
}
291276

292277
fmt.Printf("Server: %s\n", name)
293-
fmt.Printf(" Status: %s\n", status)
278+
fmt.Printf(" Health: %s %s\n", statusEmoji, healthSummary)
279+
if adminState != "" && adminState != "enabled" {
280+
fmt.Printf(" Admin State: %s\n", adminState)
281+
}
282+
// Show action as command hint (FR-007)
283+
if healthAction != "" {
284+
switch healthAction {
285+
case "login":
286+
fmt.Printf(" Action: mcpproxy auth login --server=%s\n", name)
287+
case "restart":
288+
fmt.Printf(" Action: mcpproxy upstream restart %s\n", name)
289+
case "enable":
290+
fmt.Printf(" Action: mcpproxy upstream enable %s\n", name)
291+
case "approve":
292+
fmt.Printf(" Action: Approve via Web UI or tray menu\n")
293+
case "view_logs":
294+
fmt.Printf(" Action: mcpproxy upstream logs %s\n", name)
295+
}
296+
}
294297

295298
// Display OAuth configuration details (if available)
296299
// Check if this is an autodiscovery server (no explicit OAuth config, but has token)

cmd/mcpproxy/upstream_cmd.go

Lines changed: 73 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ import (
1818

1919
"mcpproxy-go/internal/cliclient"
2020
"mcpproxy-go/internal/config"
21+
"mcpproxy-go/internal/health"
2122
"mcpproxy-go/internal/logs"
2223
"mcpproxy-go/internal/reqcontext"
2324
"mcpproxy-go/internal/socket"
@@ -177,13 +178,37 @@ func runUpstreamListFromConfig(globalConfig *config.Config) error {
177178
// Convert config servers to output format
178179
servers := make([]map[string]interface{}, len(globalConfig.Servers))
179180
for i, srv := range globalConfig.Servers {
181+
// I-003: Use health.CalculateHealth() instead of inline logic for DRY principle
182+
healthInput := health.HealthCalculatorInput{
183+
Name: srv.Name,
184+
Enabled: srv.Enabled,
185+
Quarantined: srv.Quarantined,
186+
State: "disconnected", // Daemon not running
187+
Connected: false,
188+
ToolCount: 0,
189+
}
190+
healthStatus := health.CalculateHealth(healthInput, health.DefaultHealthConfig())
191+
192+
// Override summary for config-only mode to indicate daemon status
193+
summary := healthStatus.Summary
194+
if healthStatus.AdminState == health.StateEnabled {
195+
summary = "Daemon not running"
196+
}
197+
180198
servers[i] = map[string]interface{}{
181199
"name": srv.Name,
182200
"enabled": srv.Enabled,
183201
"protocol": srv.Protocol,
184202
"connected": false,
185203
"tool_count": 0,
186-
"status": "unknown (daemon not running)",
204+
"status": summary,
205+
"health": map[string]interface{}{
206+
"level": healthStatus.Level,
207+
"admin_state": healthStatus.AdminState,
208+
"summary": summary,
209+
"detail": healthStatus.Detail,
210+
"action": healthStatus.Action,
211+
},
187212
}
188213
}
189214

@@ -206,73 +231,65 @@ func outputServers(servers []map[string]interface{}) error {
206231
}
207232
fmt.Println(string(output))
208233
case "table", "":
209-
// Table format (default) with OAuth token validity column
210-
fmt.Printf("%-25s %-10s %-10s %-12s %-10s %-20s %s\n",
211-
"NAME", "ENABLED", "PROTOCOL", "CONNECTED", "TOOLS", "OAUTH TOKEN", "STATUS")
212-
fmt.Printf("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n")
234+
// Table format (default) with unified health status
235+
fmt.Printf("%-4s %-25s %-10s %-10s %-30s %s\n",
236+
"", "NAME", "PROTOCOL", "TOOLS", "STATUS", "ACTION")
237+
fmt.Printf("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n")
213238

214239
for _, srv := range servers {
215240
name := getStringField(srv, "name")
216-
enabled := getBoolField(srv, "enabled")
217241
protocol := getStringField(srv, "protocol")
218-
connected := getBoolField(srv, "connected")
219242
toolCount := getIntField(srv, "tool_count")
220-
status := getStringField(srv, "status")
221243

222-
enabledStr := "no"
223-
if enabled {
224-
enabledStr = "yes"
244+
// Extract unified health status
245+
healthData, _ := srv["health"].(map[string]interface{})
246+
healthLevel := "unknown"
247+
healthAdminState := "enabled"
248+
healthSummary := getStringField(srv, "status") // fallback to old status
249+
healthAction := ""
250+
251+
if healthData != nil {
252+
healthLevel = getStringField(healthData, "level")
253+
healthAdminState = getStringField(healthData, "admin_state")
254+
healthSummary = getStringField(healthData, "summary")
255+
healthAction = getStringField(healthData, "action")
225256
}
226257

227-
connectedStr := "no"
228-
if connected {
229-
connectedStr = "yes"
258+
// Status emoji based on health level and admin state
259+
statusEmoji := "⚪" // unknown
260+
switch healthAdminState {
261+
case "disabled":
262+
statusEmoji = "⏸️ " // paused
263+
case "quarantined":
264+
statusEmoji = "🔒" // locked
265+
default:
266+
switch healthLevel {
267+
case "healthy":
268+
statusEmoji = "✅"
269+
case "degraded":
270+
statusEmoji = "⚠️ "
271+
case "unhealthy":
272+
statusEmoji = "❌"
273+
}
230274
}
231275

232-
// Extract OAuth token validity info
233-
oauthStatus := "-"
234-
oauth, _ := srv["oauth"].(map[string]interface{})
235-
authenticated, _ := srv["authenticated"].(bool)
236-
lastError, _ := srv["last_error"].(string)
237-
238-
// Check if this is an OAuth-related server
239-
isOAuthServer := (oauth != nil) ||
240-
containsIgnoreCase(lastError, "oauth") ||
241-
authenticated
242-
243-
if isOAuthServer {
244-
if oauth != nil {
245-
if tokenExpiresAt, ok := oauth["token_expires_at"].(string); ok && tokenExpiresAt != "" {
246-
if expiryTime, err := time.Parse(time.RFC3339, tokenExpiresAt); err == nil {
247-
timeUntilExpiry := time.Until(expiryTime)
248-
if timeUntilExpiry > 0 {
249-
oauthStatus = formatDurationShort(timeUntilExpiry)
250-
} else {
251-
oauthStatus = "⚠️ EXPIRED"
252-
}
253-
}
254-
} else if tokenValid, ok := oauth["token_valid"].(bool); ok {
255-
if tokenValid {
256-
oauthStatus = "✅ Valid"
257-
} else {
258-
oauthStatus = "⚠️ Invalid"
259-
}
260-
} else if authenticated {
261-
oauthStatus = "✅ Active"
262-
} else {
263-
oauthStatus = "⏳ Pending"
264-
}
265-
} else if authenticated {
266-
// OAuth server without config (DCR) but authenticated
267-
oauthStatus = "✅ Active"
268-
} else {
269-
// OAuth required but not authenticated yet
270-
oauthStatus = "⏳ Pending"
271-
}
276+
// Format action as CLI command hint
277+
actionHint := "-"
278+
switch healthAction {
279+
case "login":
280+
actionHint = fmt.Sprintf("auth login --server=%s", name)
281+
case "restart":
282+
actionHint = fmt.Sprintf("upstream restart %s", name)
283+
case "enable":
284+
actionHint = fmt.Sprintf("upstream enable %s", name)
285+
case "approve":
286+
actionHint = "Approve in Web UI"
287+
case "view_logs":
288+
actionHint = fmt.Sprintf("upstream logs %s", name)
272289
}
273290

274-
fmt.Printf("%-25s %-10s %-10s %-12s %-10d %-20s %s\n",
275-
name, enabledStr, protocol, connectedStr, toolCount, oauthStatus, status)
291+
fmt.Printf("%-4s %-25s %-10s %-10d %-30s %s\n",
292+
statusEmoji, name, protocol, toolCount, healthSummary, actionHint)
276293
}
277294
default:
278295
return fmt.Errorf("unknown output format: %s", upstreamOutputFormat)
@@ -686,24 +703,3 @@ func runUpstreamBulkAction(action string, force bool) error {
686703

687704
return nil
688705
}
689-
690-
// formatDurationShort formats a duration into a short human-readable string for table display
691-
func formatDurationShort(d time.Duration) string {
692-
if d < 0 {
693-
return "expired"
694-
}
695-
696-
days := int(d.Hours() / 24)
697-
hours := int(d.Hours()) % 24
698-
699-
if days > 30 {
700-
return fmt.Sprintf("%dd", days)
701-
} else if days > 0 {
702-
return fmt.Sprintf("%dd %dh", days, hours)
703-
} else if hours > 0 {
704-
return fmt.Sprintf("%dh", hours)
705-
} else {
706-
minutes := int(d.Minutes())
707-
return fmt.Sprintf("%dm", minutes)
708-
}
709-
}

0 commit comments

Comments
 (0)