Inbox messages stuck in PENDING when receiving agent is already idle

## Summary

Inbox messages can get stuck in `PENDING` indefinitely when the receiving agent is already idle at the time the message is posted. This affects all providers — Kiro CLI, Claude Code, etc. — because the issue is in the delivery architecture, not in any specific provider's status detection.

Provider: Kiro CLI (but could be provider-agnostic)

## Impact

* Agent-to-agent messaging silently fails — messages stay PENDING forever
* Multi-agent workflows stall waiting for callbacks that were sent but never delivered
* Requires manual intervention (resending the message) to unblock

## Reproduction

1. Start `cao-server` and a multi-agent session with 3+ agents
2. Agent A finishes work and goes idle (no more log output)
3. Agent B calls `send_message` to Agent A
4. Message stays PENDING — Agent A never receives it

This happens intermittently in long-running sessions (4-8 hours) with multiple concurrent agents. We observe it several times per session.

## Root Cause

The inbox has two delivery paths:

**Path 1 — Immediate delivery (on POST):** `POST /terminals/{id}/inbox/messages` calls `check_and_send_pending_messages(receiver_id)`, which calls `provider.get_status()`. If IDLE or COMPLETED, delivers immediately. **This is a single-shot attempt with no retry.** If `get_status()` returns a stale or incorrect status at that moment, delivery is skipped.

**Path 2 — PollingObserver:** Monitors `TERMINAL_LOG_DIR` for `.log` file changes every 5 seconds. On change → check pending → check idle → deliver. But if the agent is already idle and not producing output, the log file doesn't change, so the observer never fires again.

**The gap:** If Path 1 fails (stale status at the wrong moment) and the agent is already idle (Path 2 never triggers), the message is permanently orphaned. There is no fallback mechanism.

## Possible Directions

- A periodic background check for orphaned PENDING messages (similar to the existing `flow_daemon()` pattern)
- Retry logic on the immediate delivery path (e.g., a few attempts with short delays)
- A fallback poll triggered when a new message is queued but the watcher hasn't fired within N seconds

## Related Issues

- #104 — it seems to fix stale PROCESSING detection in Claude Code specifically (PR #106)
- PR #62 — added position-based status comparison to Kiro CLI / Q CLI

Both improve `get_status()` accuracy, but this issue is distinct: even with perfect status detection, the single-shot immediate delivery can miss due to timing, and there is no fallback when it does.

## Environment

* `cao-server` at commit `331e8d7` 
* macOS, Kiro CLI provider
* Observed across multiple multi-day sessions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inbox messages stuck in PENDING when receiving agent is already idle #131

Summary

Impact

Reproduction

Root Cause

Possible Directions

Related Issues

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inbox messages stuck in PENDING when receiving agent is already idle #131

Description

Summary

Impact

Reproduction

Root Cause

Possible Directions

Related Issues

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions