fix(agent): eliminate TOCTOU race in approval thread resolution#1543
fix(agent): eliminate TOCTOU race in approval thread resolution#1543
Conversation
Restructure process_approval() to hold the session lock for the entire check-take-verify-mutate sequence. Previously, the lock was dropped after taking the pending approval and re-acquired to verify request_id and set state, creating a window where concurrent operations could modify thread state or lose the pending approval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical race condition in the agent's approval process, ensuring that session state is modified atomically. It also improves error handling and adds regression tests to prevent future occurrences of the issue. The changes enhance the reliability and security of the agent. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the TOCTOU (Time-of-Check to Time-of-Use) race condition in the process_approval function by consolidating multiple lock acquisitions into a single atomic scope. This significantly improves the reliability and correctness of thread state management during approval flows. The addition of a warning for disappeared threads during rejection paths enhances observability, and the new regression tests provide solid coverage for the atomic operations, ensuring the fix is robust.
| let pending = match thread.take_pending_approval() { | ||
| Some(p) => p, | ||
| None => { | ||
| tracing::debug!( | ||
| %thread_id, | ||
| "Ignoring stale approval: no pending approval found" | ||
| ); | ||
| return Ok(SubmissionResult::ok_with_message("")); | ||
| } | ||
| }; | ||
|
|
||
| // Verify request ID if provided | ||
| if let Some(req_id) = request_id | ||
| && req_id != pending.request_id | ||
| { | ||
| // Put it back and return error | ||
| let mut sess = session.lock().await; | ||
| if let Some(thread) = sess.threads.get_mut(&thread_id) { | ||
| // Verify request ID while still holding the lock | ||
| if let Some(req_id) = request_id | ||
| && req_id != pending.request_id | ||
| { | ||
| // Restore the pending approval atomically without releasing the lock | ||
| thread.await_approval(pending); | ||
| return Ok(SubmissionResult::error( | ||
| "Request ID mismatch. Use the correct request ID.", | ||
| )); | ||
| } | ||
| return Ok(SubmissionResult::error( | ||
| "Request ID mismatch. Use the correct request ID.", | ||
| )); | ||
| } | ||
|
|
||
| if approved { | ||
| // If always, add to auto-approved set | ||
| if always { | ||
| let mut sess = session.lock().await; | ||
| sess.auto_approve_tool(&pending.tool_name); | ||
| // If approved with "always", set auto-approve while we still have the lock. | ||
| // Drop the thread borrow before calling sess methods to satisfy the borrow checker. | ||
| if approved && always { | ||
| let tool_name = pending.tool_name.clone(); | ||
| sess.auto_approve_tool(&tool_name); | ||
| tracing::info!( | ||
| "Auto-approved tool '{}' for session {}", | ||
| pending.tool_name, | ||
| tool_name, | ||
| sess.id | ||
| ); | ||
| } | ||
|
|
||
| // Reset thread state to processing | ||
| // Set thread state to Processing for approved requests. | ||
| // Re-borrow the thread since the previous borrow was dropped. | ||
| if approved | ||
| && let Some(thread) = sess.threads.get_mut(&thread_id) | ||
| { | ||
| let mut sess = session.lock().await; | ||
| if let Some(thread) = sess.threads.get_mut(&thread_id) { | ||
| thread.state = ThreadState::Processing; | ||
| } | ||
| thread.state = ThreadState::Processing; | ||
| } | ||
|
|
||
| pending |
There was a problem hiding this comment.
This refactoring correctly consolidates multiple state-modifying operations under a single lock acquisition. This is a critical change that eliminates the TOCTOU race condition, ensuring atomicity for pending approval management, request ID verification, auto-approval, and thread state transitions. This directly addresses the core issue outlined in the pull request.
References
- To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write. This is known as the double-checked locking pattern.
| tracing::warn!( | ||
| %thread_id, | ||
| "Thread disappeared during approval rejection" | ||
| ); |
There was a problem hiding this comment.
| /// Regression test for #1486: verify that request_id mismatch atomically | ||
| /// restores the pending approval (no TOCTOU window where it could be lost). | ||
| #[test] | ||
| fn test_request_id_mismatch_preserves_pending_state() { | ||
| use crate::agent::session::{PendingApproval, Session, Thread, ThreadState}; | ||
| use uuid::Uuid; | ||
|
|
||
| let session_id = Uuid::new_v4(); | ||
| let thread_id = Uuid::new_v4(); | ||
| let correct_request_id = Uuid::new_v4(); | ||
| let wrong_request_id = Uuid::new_v4(); | ||
|
|
||
| let mut thread = Thread::with_id(thread_id, session_id); | ||
| let pending = PendingApproval { | ||
| request_id: correct_request_id, | ||
| tool_name: "shell".to_string(), | ||
| parameters: serde_json::json!({"command": "rm -rf /"}), | ||
| display_parameters: serde_json::json!({"command": "[REDACTED]"}), | ||
| description: "Execute dangerous command".to_string(), | ||
| tool_call_id: "call_42".to_string(), | ||
| context_messages: vec![], | ||
| deferred_tool_calls: vec![], | ||
| user_timezone: None, | ||
| allow_always: false, | ||
| }; | ||
| thread.await_approval(pending); | ||
|
|
||
| let mut session = Session::new("test-user"); | ||
| session.threads.insert(thread_id, thread); | ||
|
|
||
| // Verify initial state | ||
| assert_eq!( | ||
| session.threads[&thread_id].state, | ||
| ThreadState::AwaitingApproval | ||
| ); | ||
| assert!(session.threads[&thread_id].pending_approval.is_some()); | ||
|
|
||
| // Simulate the atomic check-take-verify sequence from the fixed code: | ||
| // take the approval, then verify request_id, and restore on mismatch. | ||
| let thread = session.threads.get_mut(&thread_id).unwrap(); | ||
| let taken = thread.take_pending_approval().unwrap(); | ||
| assert!(thread.pending_approval.is_none(), "take should clear it"); | ||
|
|
||
| // Wrong request_id -- restore atomically (within same "lock scope" in production) | ||
| assert_ne!(wrong_request_id, taken.request_id); | ||
| thread.await_approval(taken); | ||
|
|
||
| // Verify the pending approval is fully restored | ||
| assert_eq!( | ||
| session.threads[&thread_id].state, | ||
| ThreadState::AwaitingApproval | ||
| ); | ||
| let restored = session.threads[&thread_id] | ||
| .pending_approval | ||
| .as_ref() | ||
| .expect("pending approval must be restored after request_id mismatch"); | ||
| assert_eq!(restored.request_id, correct_request_id); | ||
| assert_eq!(restored.tool_name, "shell"); | ||
| assert_eq!(restored.tool_call_id, "call_42"); |
There was a problem hiding this comment.
The new regression test test_request_id_mismatch_preserves_pending_state is excellent. It specifically validates the atomic restoration of pending approval state when a request ID mismatch occurs, directly confirming the fix for the TOCTOU race condition. This test is crucial for ensuring the stability of the approval mechanism.
References
- To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write. This is known as the double-checked locking pattern.
| /// Regression test for #1486: verify that approval with "always" flag | ||
| /// correctly transitions thread state and registers auto-approve in a | ||
| /// single lock scope. | ||
| #[test] | ||
| fn test_approval_with_always_sets_auto_approve_and_processing() { | ||
| use crate::agent::session::{PendingApproval, Session, Thread, ThreadState}; | ||
| use uuid::Uuid; | ||
|
|
||
| let session_id = Uuid::new_v4(); | ||
| let thread_id = Uuid::new_v4(); | ||
| let request_id = Uuid::new_v4(); | ||
|
|
||
| let mut thread = Thread::with_id(thread_id, session_id); | ||
| let pending = PendingApproval { | ||
| request_id, | ||
| tool_name: "web_fetch".to_string(), | ||
| parameters: serde_json::json!({"url": "https://example.com"}), | ||
| display_parameters: serde_json::json!({"url": "https://example.com"}), | ||
| description: "Fetch a web page".to_string(), | ||
| tool_call_id: "call_99".to_string(), | ||
| context_messages: vec![], | ||
| deferred_tool_calls: vec![], | ||
| user_timezone: None, | ||
| allow_always: true, | ||
| }; | ||
| thread.await_approval(pending); | ||
|
|
||
| let mut session = Session::new("test-user"); | ||
| session.threads.insert(thread_id, thread); | ||
|
|
||
| // Simulate the combined lock scope: take + verify + auto_approve + state transition | ||
| // Simulate the combined lock scope: take + verify + auto_approve + state transition. | ||
| // Use separate scopes to mirror the borrow pattern in production code. | ||
| let taken = { | ||
| let thread = session.threads.get_mut(&thread_id).unwrap(); | ||
| thread.take_pending_approval().unwrap() | ||
| }; | ||
|
|
||
| // Request ID matches | ||
| assert_eq!(taken.request_id, request_id); | ||
|
|
||
| // Auto-approve the tool (thread borrow dropped, so session is accessible) | ||
| session.auto_approve_tool(&taken.tool_name); | ||
|
|
||
| // Re-borrow thread and set state to Processing | ||
| { | ||
| let thread = session.threads.get_mut(&thread_id).unwrap(); | ||
| thread.state = ThreadState::Processing; | ||
| } | ||
|
|
||
| // Verify both mutations happened atomically | ||
| assert!(session.is_tool_auto_approved("web_fetch")); | ||
| assert_eq!( | ||
| session.threads[&thread_id].state, | ||
| ThreadState::Processing | ||
| ); | ||
| // Pending approval should be consumed (not restored) | ||
| assert!(session.threads[&thread_id].pending_approval.is_none()); |
There was a problem hiding this comment.
The test_approval_with_always_sets_auto_approve_and_processing regression test provides strong validation for the atomic execution of auto-approval and thread state transition. This test is well-designed to cover the combined effects of the changes, reinforcing the robustness of the process_approval function.
References
- To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write. This is known as the double-checked locking pattern.
Apply cargo fmt to resolve CI formatting check failure. https://claude.ai/code/session_013ZCQWoFHv2hASgHEGHzptg
|
Superseded by #1591 (consolidated TOCTOU + error handling fix) |
Summary
process_approval()into a single atomic scope: check thread state, take pending approval, verify request_id, set auto-approve, and transition to Processing all happen under one lockCloses #1486
Test plan
--all-features