fix(security): harden approval thread safety (TOCTOU + error handling) by zmanian · Pull Request #2366 · nearai/ironclaw

zmanian · 2026-04-12T10:47:30Z

Summary

Consolidates two security fixes for the approval processing flow in thread_ops.rs:

TOCTOU race ([CRITICAL] TOCTOU race condition in approval thread resolution #1486): Hold session lock for the entire take-verify sequence in process_approval() so pending approval cannot be lost if a concurrent operation modifies the thread between take and restore. Previously, the lock was dropped after take_pending_approval() and re-acquired for request_id verification, creating a window where the approval could be permanently lost.
Silent error fallback ([HIGH] Incomplete fallback logic for non-existent approval threads #1487): Replace 10 silent if let Some(thread) patterns with explicit match arms. Critical paths (state transitions, deferred approval setup) return errors when threads disappear. Non-critical paths (tool result recording, auth mode, rejection) log debug messages but continue.

Files changed

src/agent/thread_ops.rs — both fixes + 2 regression tests

Test plan

Regression test: test_approval_request_id_mismatch_restores_pending
Regression test: test_approval_on_missing_thread_should_error
cargo check clean
Zero clippy warnings
CI: full test suite (local env too memory-constrained for test binary linking)

Supersedes #1591 (branch had no merge base with current staging after repo restructuring).
Closes #1486, Closes #1487

https://claude.ai/code/session_01X86EZxqXEFiU9VetyhPKjM

gemini-code-assist

Code Review

This pull request addresses TOCTOU races in the process_approval function by ensuring that pending approvals are taken and verified under a single lock. It also improves error handling across several thread operations by explicitly checking for missing threads and providing appropriate logging or error returns. Regression tests were added to verify the fix for the race condition and the handling of missing threads. Feedback suggests enhancing visibility into unexpected states by adding warning logs when a thread has no turns during tool result recording.

gemini-code-assist · 2026-04-12T10:52:56Z

src/agent/thread_ops.rs

+                match sess.threads.get_mut(&thread_id) {
+                    Some(thread) => {
+                        if let Some(turn) = thread.last_turn_mut() {
+                            if is_tool_error {
+                                turn.record_tool_error_for(
+                                    &pending.tool_call_id,
+                                    result_content.clone(),
+                                );
+                            } else {
+                                turn.record_tool_result_for(
+                                    &pending.tool_call_id,
+                                    serde_json::json!(result_content),
+                                );
+                            }
+                        }
+                    }
+                    None => {
+                        tracing::debug!(
+                            %thread_id,
+                            "Thread disappeared before tool result could be recorded"
                        );
                    }


While the change correctly handles the case where the thread is missing, it still silently ignores the case where thread.last_turn_mut() returns None. Given the objective to harden error handling and avoid silent fallbacks, this unexpected state should be surfaced via tracing::warn! to aid in troubleshooting, consistent with the repository's practice for handling persistence and task failures.

match sess.threads.get_mut(&thread_id) { Some(thread) => { match thread.last_turn_mut() { Some(turn) => { if is_tool_error { turn.record_tool_error_for( &pending.tool_call_id, result_content.clone(), ); } else { turn.record_tool_result_for( &pending.tool_call_id, serde_json::json!(result_content), ); } } None => { tracing::warn!( %thread_id, "Thread has no turns; tool result could not be recorded" ); } } } None => { tracing::warn!( %thread_id, "Thread disappeared before tool result could be recorded" ); } }

References

Errors and unexpected states in persistence or task execution should be surfaced via tracing::warn! to ensure reliability and visibility.

gemini-code-assist · 2026-04-12T10:52:56Z

src/agent/thread_ops.rs

+                    match sess.threads.get_mut(&thread_id) {
+                        Some(thread) => {
+                            if let Some(turn) = thread.last_turn_mut() {
+                                if is_deferred_error {
+                                    turn.record_tool_error_for(&tc.id, deferred_content.clone());
+                                } else {
+                                    turn.record_tool_result_for(
+                                        &tc.id,
+                                        serde_json::json!(deferred_content),
+                                    );
+                                }
+                            }
+                        }
+                        None => {
+                            tracing::debug!(
+                                %thread_id,
+                                tool = %tc.name,
+                                "Thread disappeared before deferred tool result could be recorded"
                            );
                        }


Similar to the tool result recording path above, the case where thread.last_turn_mut() is None is currently handled silently. Adding a warning log for this scenario ensures consistency in error visibility across the codebase and avoids silent failures during tool result recording.

match sess.threads.get_mut(&thread_id) { Some(thread) => { match thread.last_turn_mut() { Some(turn) => { if is_deferred_error { turn.record_tool_error_for(&tc.id, deferred_content.clone()); } else { turn.record_tool_result_for( &tc.id, serde_json::json!(deferred_content), ); } } None => { tracing::warn!( %thread_id, tool = %tc.name, "Thread has no turns; deferred tool result could not be recorded" ); } } } None => { tracing::warn!( %thread_id, tool = %tc.name, "Thread disappeared before deferred tool result could be recorded" ); } }

References

Surface unexpected failures or state inconsistencies via tracing::warn! to ensure reliability and visibility, as seen in rules for audit persistence and task joining.

Consolidates two security fixes for the approval processing flow in thread_ops.rs: **TOCTOU race (#1486):** Hold session lock for the entire take-verify sequence in process_approval() so pending approval cannot be lost if a concurrent operation modifies the thread between take and restore. Previously, the lock was dropped after take_pending_approval() and re-acquired for request_id verification, creating a window where the approval could be permanently lost. **Silent error fallback (#1487):** Replace 10 silent `if let Some(thread)` patterns with explicit `match` arms. Critical paths (state transitions, deferred approval setup) return errors when threads disappear. Non-critical paths (tool result recording, auth mode, rejection) log debug messages but continue. Regression tests: - test_approval_request_id_mismatch_restores_pending - test_approval_on_missing_thread_should_error Supersedes #1591 (branch had no merge base with current staging). Closes #1486, Closes #1487 https://claude.ai/code/session_01X86EZxqXEFiU9VetyhPKjM

serrrfirat · 2026-04-13T07:55:25Z

src/agent/thread_ops.rs

+                    None => {
+                        tracing::debug!(
+                            %thread_id,
+                            "Thread disappeared before rejection could be recorded"


High — Rejection path returns success when thread is gone

When the thread disappears before rejection can be recorded, the function logs a debug message but still sends StatusUpdate::Status("Rejected") and returns SubmissionResult::response(rejection). The caller and UI see a successful rejection for a thread that no longer exists.

More critically, the pending approval was already take()-n at line 1181 and consumed. If the thread disappears between take and the rejection path, the approval is permanently lost with no error surfaced.

Suggested fix: The None arm for the rejection branch should return an error (like the approval-path critical sites at lines 1267-1274 and 1691-1697), e.g., SubmissionResult::error("Thread disappeared during rejection").

serrrfirat · 2026-04-13T07:55:30Z

src/agent/thread_ops.rs

+                    None => {
+                        tracing::debug!(
+                            %thread_id,
+                            "Thread disappeared before tool result could be recorded"


Medium — Tool result recording silently continues when thread gone

When the thread disappears before the tool result can be recorded, the code logs debug and continues. But the tool has already been executed (line 1307-1308). The context_messages still get the tool result appended (line 1389-1393), so the agentic loop may resume with context referencing a tool call whose result was never persisted in the thread.

This is partially mitigated by the error at line 1769-1770 but could cause a mismatch between LLM context and persisted state. A comment explaining why proceeding is safe would clarify intent.

serrrfirat · 2026-04-13T07:55:35Z

src/agent/thread_ops.rs

+                None => {
+                    tracing::debug!(
+                        %thread_id,
+                        "Thread disappeared before auth intercept could be applied"


Medium — Orphaned SSE emission for dead thread

When the thread disappears, the None arm logs debug but execution continues past the block to emit_auth_required_status(). This sends an AuthRequired SSE event to the client for a thread that no longer exists. The client will show an auth prompt that can never be completed. The caller (process_approval) subsequently returns SubmissionResult::auth_pending() for a dead thread.

Suggested fix: Return early from handle_auth_intercept when the thread is gone, or suppress the SSE emission.

serrrfirat · 2026-04-13T07:55:38Z

src/agent/thread_ops.rs

+                        None => {
+                            tracing::debug!(
+                                %thread_id,
+                                "Thread disappeared before auth mode could be re-entered"


Medium — Orphaned SSE emission in process_auth_token

Two code paths in process_auth_token (the Ok(result) and Err(e) arms) re-enter auth mode. When the thread is gone, the None arm logs debug but execution continues to emit_auth_required_status(). This sends auth-required SSE events for a nonexistent thread.

Suggested fix: Add early return or skip the emit_auth_required_status call when the thread was not found.

serrrfirat

Paranoid Architect Review — REQUEST CHANGES

1 High, 4 Medium findings.

The core TOCTOU fix (holding the lock for take-verify) is correct and the regression test validates it well. However, the silent-error-fallback fix is inconsistently applied:

High: The rejection path returns success to the caller when the thread is gone, permanently losing the approval with no error.
Medium: Two handle_auth_intercept / process_auth_token paths emit SSE events for dead threads — clients get stuck in auth prompts that can never resolve.
Medium (Finding 5 — not in diff, line ~374): One if let Some(thread) pattern at line ~374 in process_user_input (the Processing/queue path) was not converted, contrary to the PR's stated scope of "Replace 10 silent if let Some(thread) patterns." If the thread disappears between the state snapshot (line 344) and the mutable re-lock (line 373), the code silently falls through to the Idle | Completed match arm, which could incorrectly start a new turn on a removed thread. Suggested fix: Convert this to a match with explicit None handling, consistent with the other 9 sites.

The rejection path (finding #1) is the ship-blocker — it should return an error, not success, matching the approval path's behavior.

github-actions bot added scope: agent Agent core (agent loop, router, scheduler) size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 12, 2026

gemini-code-assist bot reviewed Apr 12, 2026

View reviewed changes

github-actions bot added the scope: dependencies Dependency updates label Apr 12, 2026

zmanian force-pushed the fix/approval-thread-safety-v2 branch from dd9528b to 1e93133 Compare April 12, 2026 19:19

zmanian mentioned this pull request Apr 13, 2026

fix(security): harden approval thread safety (TOCTOU + error handling) #1591

Closed

4 tasks

zmanian requested review from ilblackdragon and serrrfirat April 13, 2026 00:30

github-actions bot mentioned this pull request Apr 13, 2026

🦞 OpenClaw 生态日报 2026-04-13 gsscsd/big_model_radar#178

Open

serrrfirat reviewed Apr 13, 2026

View reviewed changes

serrrfirat requested changes Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): harden approval thread safety (TOCTOU + error handling)#2366

fix(security): harden approval thread safety (TOCTOU + error handling)#2366
zmanian wants to merge 1 commit intostagingfrom
fix/approval-thread-safety-v2

zmanian commented Apr 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Uh oh!

gemini-code-assist bot Apr 12, 2026

Uh oh!

serrrfirat Apr 13, 2026

Uh oh!

serrrfirat Apr 13, 2026

Uh oh!

serrrfirat Apr 13, 2026

Uh oh!

serrrfirat Apr 13, 2026

Uh oh!

serrrfirat left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zmanian commented Apr 12, 2026

Summary

Files changed

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Paranoid Architect Review — REQUEST CHANGES

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants