-
-
Notifications
You must be signed in to change notification settings - Fork 450
feat: queue monitoring system (extracted from PR #335) #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Extracted queue-related features only (excluding Bun→Node runtime switch): **Queue Infrastructure:** - PendingMessageStore: SQLite-based persistent message queue - pending_messages table (migration 12) in SessionStore - Queue status tracking (pending/processing/processed/failed) - Retry logic with configurable max retries **Queue UI:** - QueueButton: Header button with badge showing queue count - QueueDrawer: Slide-out panel for queue management - QueueMessageCard: Individual message display with actions - useQueue hook: Real-time queue updates via SSE - useNotifications hook: Toast notifications for queue events **Auto-Recovery:** - WatchdogService: 30-second polling for stuck messages - Automatic reset of stuck processing messages - Manual retry/abort controls in UI - Session orphan recovery **API Endpoints:** - GET /api/queue: Fetch all pending/processing messages - GET /api/queue/recently-processed: Show recent completions - POST /api/queue/retry/:id: Retry specific message - POST /api/queue/abort/:id: Remove message from queue - POST /api/queue/recover/:sessionId: Force-restart stuck session **Integration:** - SessionManager tracks pending message IDs during processing - worker-service initializes WatchdogService and QueueRoutes - Original timestamp preservation for accurate observation dating **Kept on Bun runtime:** All sqlite imports use 'bun:sqlite' (not sqlite-compat) This enables visible queue monitoring, manual intervention for stuck messages, and automatic recovery from worker crashes - solving real reliability issues documented in PR #315. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Pull Request Review: Queue Monitoring SystemI've completed a comprehensive review of PR #338. This is a well-architected feature that addresses real operational problems with stuck messages. Here's my detailed feedback: ✅ StrengthsArchitecture & Design
Database Schema
Error Handling & Recovery
UI Implementation
🔍 Issues & Recommendations🟡 Medium Priority1. SQL Injection Risk in QueueRoutesLocation: src/services/worker/http/routes/QueueRoutes.ts:88-89 The 2. Race Condition in WatchdogService RecoveryLocation: src/services/worker/WatchdogService.ts:95-116 for (const sessionDbId of sessionsWithPending) {
const session = this.sessionManager.getSession(sessionDbId);
if (session) {
if (!session.generatorPromise) {
this.startGeneratorForSession(session);
}
}
}Issue: Between checking Recommendation: Add a lock or atomic check-and-set: if (session && !session.generatorPromise) {
session.generatorPromise = this.sdkAgent.startSession(session, this.workerRef)...
// Rest of logic
}3. Unbounded Retry Loop PotentialLocation: src/services/sqlite/PendingMessageStore.ts:243-268 The Recommendation: Consider adding a delay or exponential backoff before re-queuing to 4. Memory Leak Risk in Debug LogLocation: src/services/worker/SessionManager.ts:60-72 The debug log ring buffer ( Recommendation: This is likely fine as-is, but monitor in production. Consider using a circular buffer implementation if this becomes a hotspot. 5. Type Safety:
|
Summary
Extracts the queue monitoring system from PR #335, excluding the Bun→Node runtime switch and Windows PowerShell fixes.
This PR focuses solely on the queue infrastructure that provides visibility and recovery for stuck messages.
What's Included
Queue Infrastructure
pending→processing→processed/failedQueue UI
API Endpoints (QueueRoutes)
GET /api/queue- Fetch pending/processing messagesGET /api/queue/recently-processed- Show recent completionsPOST /api/queue/retry/:id- Retry specific messagePOST /api/queue/abort/:id- Remove from queuePOST /api/queue/recover/:sessionId- Force-restart stuck sessionIntegration
What's Excluded
❌ Bun → Node.js runtime switch - Kept Bun runtime with
bun:sqliteimports❌ Windows PowerShell fixes - Will be addressed in separate PR (see PR #335 conversation)
❌ sqlite-compat.ts - Runtime abstraction layer not needed
Rationale
Based on PR #335 review:
The Windows PowerShell fixes are needed regardless of runtime (affects both Bun and Node) and will be submitted as a higher-priority separate PR.
Testing
✅ Build passes successfully
✅ All TypeScript compiles
✅ Queue UI components render
✅ PendingMessageStore database operations work
Related
🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 [email protected]