-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: Agent thread safety: unsynchronized chat_history and shared mutable state #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6,8 +6,12 @@ | |||||||||||||||||||||||||||||
| import contextlib | ||||||||||||||||||||||||||||||
| import threading | ||||||||||||||||||||||||||||||
| from typing import List, Optional, Any, Dict, Union, Literal, TYPE_CHECKING, Callable, Generator | ||||||||||||||||||||||||||||||
| from collections import OrderedDict | ||||||||||||||||||||||||||||||
| import inspect | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # Module-level logger for thread safety errors and debugging | ||||||||||||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # ============================================================================ | ||||||||||||||||||||||||||||||
| # Performance: Lazy imports for heavy dependencies | ||||||||||||||||||||||||||||||
| # Rich, LLM, and display utilities are only imported when needed (output=verbose) | ||||||||||||||||||||||||||||||
|
|
@@ -1511,9 +1515,12 @@ def __init__( | |||||||||||||||||||||||||||||
| self.embedder_config = embedder_config | ||||||||||||||||||||||||||||||
| self.knowledge = knowledge | ||||||||||||||||||||||||||||||
| self.use_system_prompt = use_system_prompt | ||||||||||||||||||||||||||||||
| # Thread-safe chat_history with lazy lock for concurrent access | ||||||||||||||||||||||||||||||
| # Thread-safe chat_history with eager lock initialization | ||||||||||||||||||||||||||||||
| self.chat_history = [] | ||||||||||||||||||||||||||||||
| self.__history_lock = None # Lazy initialized | ||||||||||||||||||||||||||||||
| self.__history_lock = threading.Lock() # Eager initialization to prevent race conditions | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # Thread-safe snapshot/redo stack lock - always available even when autonomy is disabled | ||||||||||||||||||||||||||||||
| self.__snapshot_lock = threading.Lock() | ||||||||||||||||||||||||||||||
| self.markdown = markdown | ||||||||||||||||||||||||||||||
| self.stream = stream | ||||||||||||||||||||||||||||||
| self.metrics = metrics | ||||||||||||||||||||||||||||||
|
|
@@ -1632,10 +1639,11 @@ def __init__( | |||||||||||||||||||||||||||||
| # P8/G11: Tool timeout - prevent slow tools from blocking | ||||||||||||||||||||||||||||||
| self._tool_timeout = tool_timeout | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # Cache for system prompts and formatted tools with lazy thread-safe lock | ||||||||||||||||||||||||||||||
| self._system_prompt_cache = {} | ||||||||||||||||||||||||||||||
| self._formatted_tools_cache = {} | ||||||||||||||||||||||||||||||
| self.__cache_lock = None # Lazy initialized RLock | ||||||||||||||||||||||||||||||
| # Cache for system prompts and formatted tools with eager thread-safe lock | ||||||||||||||||||||||||||||||
| # Use OrderedDict for LRU behavior | ||||||||||||||||||||||||||||||
| self._system_prompt_cache = OrderedDict() | ||||||||||||||||||||||||||||||
| self._formatted_tools_cache = OrderedDict() | ||||||||||||||||||||||||||||||
|
Comment on lines
+1642
to
+1645
|
||||||||||||||||||||||||||||||
| self.__cache_lock = threading.RLock() # Eager initialization to prevent race conditions | ||||||||||||||||||||||||||||||
| # Limit cache size to prevent unbounded growth | ||||||||||||||||||||||||||||||
| self._max_cache_size = 100 | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
@@ -1747,20 +1755,106 @@ def _telemetry(self): | |||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| @property | ||||||||||||||||||||||||||||||
| def _history_lock(self): | ||||||||||||||||||||||||||||||
| """Lazy-loaded history lock for thread-safe chat history access.""" | ||||||||||||||||||||||||||||||
| if self.__history_lock is None: | ||||||||||||||||||||||||||||||
| import threading | ||||||||||||||||||||||||||||||
| self.__history_lock = threading.Lock() | ||||||||||||||||||||||||||||||
| """Thread-safe chat history lock.""" | ||||||||||||||||||||||||||||||
| return self.__history_lock | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| @property | ||||||||||||||||||||||||||||||
| def _cache_lock(self): | ||||||||||||||||||||||||||||||
| """Lazy-loaded cache lock for thread-safe cache access.""" | ||||||||||||||||||||||||||||||
| if self.__cache_lock is None: | ||||||||||||||||||||||||||||||
| import threading | ||||||||||||||||||||||||||||||
| self.__cache_lock = threading.RLock() | ||||||||||||||||||||||||||||||
| """Thread-safe cache lock.""" | ||||||||||||||||||||||||||||||
| return self.__cache_lock | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| @property | ||||||||||||||||||||||||||||||
| def _snapshot_lock(self): | ||||||||||||||||||||||||||||||
| """Thread-safe snapshot/redo stack lock.""" | ||||||||||||||||||||||||||||||
| return self.__snapshot_lock | ||||||||||||||||||||||||||||||
|
Comment on lines
+1768
to
+1769
|
||||||||||||||||||||||||||||||
| """Thread-safe snapshot/redo stack lock.""" | |
| return self.__snapshot_lock | |
| """Thread-safe snapshot/redo stack lock. | |
| This property is defensive: if the underlying lock was not | |
| initialized (e.g., because autonomy was disabled and | |
| `_init_autonomy` exited early), it is created on first access. | |
| """ | |
| try: | |
| return self.__snapshot_lock | |
| except AttributeError: | |
| # Fallback initialization to avoid AttributeError when autonomy is disabled | |
| self.__snapshot_lock = threading.Lock() | |
| return self.__snapshot_lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation for updating the LRU cache uses del and re-assignment to move an existing key to the end. A more idiomatic and efficient way to achieve this with OrderedDict is to update the value and then use move_to_end(). This is an O(1) operation, whereas del can be less performant.
| if key in cache_dict: | |
| del cache_dict[key] | |
| # Add new entry | |
| cache_dict[key] = value | |
| cache_dict[key] = value | |
| cache_dict.move_to_end(key) |
Copilot
AI
Mar 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread-safety for chat_history is only enforced if all call sites use these helpers (or take _history_lock). There are still many direct self.chat_history.append(...) usages elsewhere in this file (e.g., at lines 5994, 6030, 6475, 6530, 6809, 8131, etc.), which means races can still occur even with the new lock. To fully address #1164, replace the remaining direct mutations/rollbacks with _add_to_chat_history / _truncate_chat_history or wrap them in with self._history_lock:.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When retrieving an item from the LRU cache, you are using del and re-assignment to move it to the end. This can be made more efficient by using OrderedDict.move_to_end(), which is an O(1) operation and more clearly expresses the intent.
| value = cache_dict[key] | |
| del cache_dict[key] | |
| cache_dict[key] = value | |
| return value | |
| value = cache_dict[key] | |
| cache_dict.move_to_end(key) | |
| return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo()/redo() error handling currently raises NameError.
logger is never defined in this module, so any failure inside these blocks hits a second exception on the debug line instead of returning False. The adjacent logger.debug(...) calls at Line 2258 and Line 2362 have the same problem.
🐛 Minimal fix
- logger.debug(f"Undo failed: {e}")
+ logging.debug(f"Undo failed: {e}")
return False
@@
- logger.debug(f"Redo failed: {e}")
+ logging.debug(f"Redo failed: {e}")
return FalseAlso applies to: 2318-2330
🧰 Tools
🪛 Ruff (0.15.7)
[warning] 2303-2303: Do not catch blind exception: Exception
(BLE001)
[error] 2304-2304: Undefined name logger
(F821)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/praisonai-agents/praisonaiagents/agent/agent.py` around lines 2292 -
2305, The debug calls in undo() and redo() (and the nearby logger.debug at other
locations) reference an undefined name logger causing a NameError on exception;
fix by introducing a module-level logger (e.g., import logging and assign logger
= logging.getLogger(__name__)) at the top of the module and leave the existing
logger.debug(...) calls as-is so exceptions inside the with self._snapshot_lock
blocks (and surrounding handlers) log correctly and return False. Ensure the new
logger is available before methods like undo(), redo(), and any other routines
that call logger.debug are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The port can be marked started before any server is actually available.
_server_started[port] flips before the background thread is created and long before uvicorn is listening. A concurrent launch() will then skip startup even if thread creation or server boot fails, leaving that port wedged until restart. The shared _registered_agents / _shared_apps maps are also still read outside _server_lock, so the global launch state remains racy. Based on learnings: Never maintain shared mutable global state between agents; use explicit channels (EventBus, handoff) for multi-agent communication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/praisonai-agents/praisonaiagents/agent/agent.py` around lines 8673 -
8679, The code sets _server_started[port] = True before the background server
actually begins, causing race conditions in launch() and leaving ports wedged if
thread/startup fails; fix by only marking the port started after the server is
confirmed listening (e.g., set _server_started[port] = True inside the
background thread once uvicorn reports ready or after a readiness Event is set),
ensure the background thread communicates success/failure back to the launcher
via a threading.Event or similar handoff, and also protect all accesses to
shared maps _registered_agents and _shared_apps with _server_lock (or eliminate
these globals and use explicit event/hand-off channels) so reads/writes are
atomic and the global launch state is not racy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chat_historyis still a racy public list.These helpers only protect the custom-LLM path. The standard OpenAI, async, streaming, restore/autosave paths in this file still read/append/replace
self.chat_historydirectly, andsrc/praisonai-agents/praisonaiagents/session.py/src/praisonai-agents/praisonaiagents/llm/llm.pyalso bypass_history_lock. In concurrent runs that still allows lost turns, mid-iteration mutation, and stale references after slice assignment. To actually fix this,chat_historyneeds to be fully encapsulated behind a thread-safe API/container instead of exposing the raw list.Also applies to: 1785-1807, 6362-6412
🤖 Prompt for AI Agents