fix: clean up failed conversation data to avoid conflict on retry#2984
fix: clean up failed conversation data to avoid conflict on retry#2984
Conversation
When an agent conversation fails unexpectedly (leaving state as RUNNING or FAILED), subsequent attempts to start a new conversation would cause a database conflict error due to duplicate conv_id. This fix: - Adds _cleanup_failed_conversation method to remove failed conversation records and associated messages/plans - Detects RUNNING or FAILED state in last conversation before creating a new one - Cleans up residual data before generating a new conversation ID Co-Authored-By: Claude (GLM-5) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a database conflict error that occurred when retrying an agent conversation after an unexpected failure. When a conversation ends in RUNNING or FAILED state, subsequent new conversations would fail with a duplicate conv_id error because the ID generation was count-based and didn't account for the leftover failed records.
Changes:
- Added
_cleanup_failed_conversationmethod toMultiAgentsthat deletes messages, plans, and the conversation record for a failed conversation - Modified the
agent_chat_v2method to detectRUNNING/FAILEDstates in the last conversation and clean up before creating a new one - Added an
agent_conv_id = Noneinitialization and moved the conversation ID generation logic to after the cleanup block
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| # Delete messages associated with this conversation | ||
| self.gpts_messages_dao.delete_chat_message(conv_id) | ||
| logger.info(f"Deleted messages for failed conversation: {conv_id}") | ||
| except Exception as e: | ||
| logger.warning(f"Failed to delete messages for {conv_id}: {e}") | ||
|
|
||
| try: | ||
| # Delete plans associated with this conversation | ||
| self.memory.plans_memory.remove_by_conv_id(conv_id) | ||
| logger.info(f"Deleted plans for failed conversation: {conv_id}") | ||
| except Exception as e: | ||
| logger.warning(f"Failed to delete plans for {conv_id}: {e}") | ||
|
|
||
| try: | ||
| # Delete the conversation record | ||
| self.gpts_conversations.delete_chat_message(conv_id) | ||
| logger.info(f"Deleted failed conversation record: {conv_id}") | ||
| except Exception as e: | ||
| logger.warning(f"Failed to delete conversation record {conv_id}: {e}") |
There was a problem hiding this comment.
The gpts_conversations.delete_chat_message(conv_id) call (and similarly gpts_messages_dao.delete_chat_message(conv_id)) uses a LIKE '%conv_id%' pattern inside the DAO, where conv_id is the failed conversation's ID in the format {base_conv_id}_{N} (e.g., "abc123_1").
This LIKE pattern (%abc123_1%) will match not only the target conversation abc123_1, but also any record whose conv_id contains abc123_1 as a substring — for example abc123_10, abc123_11, abc123_19, etc. If the user has had 10+ conversation turns under the same base_conv_id, those higher-numbered conversations (and their associated messages) will be silently deleted when only the failed _1 record should be cleaned up.
To fix this, instead of relying on the LIKE-based delete_chat_message method, a new or existing exact-match deletion method should be used, or the cleanup should use an equality filter (conv_id == failed_conv_id) rather than a substring match.
| try: | ||
| # Delete the conversation record | ||
| self.gpts_conversations.delete_chat_message(conv_id) | ||
| logger.info(f"Deleted failed conversation record: {conv_id}") |
There was a problem hiding this comment.
The method called to delete the conversation record is self.gpts_conversations.delete_chat_message(conv_id). This is the GptsConversationsDao.delete_chat_message() method, whose name strongly implies it deletes messages — but it actually deletes entries from the gpts_conversations table. The comment on line 187 correctly says "Delete the conversation record", but the method name creates confusion for readers and could lead to maintainability issues. A more appropriately named method (e.g., delete_by_conv_id) should be used or added to the DAO.
Summary
_cleanup_failed_conversationmethod to remove failed conversation records and associated messages/plansRUNNINGorFAILEDstate in last conversation before creating a new oneProblem
When an agent conversation fails unexpectedly (leaving state as
RUNNINGorFAILED), subsequent attempts to start a new conversation would cause a database conflict error due to duplicateconv_id. The conversation ID generation was based on counting existing records, which didn't account for failed conversations.Solution
Added
_cleanup_failed_conversationmethod that:Modified conversation ID generation logic to:
RUNNINGorFAILEDstateTest plan
🤖 Generated with Claude Code