Python: Fix ChatHistoryTruncationReducer deleting system prompt#13610
Python: Fix ChatHistoryTruncationReducer deleting system prompt#13610roli-lpci wants to merge 2 commits intomicrosoft:mainfrom
Conversation
…osoft#12612) The truncation reducer used `extract_range()` which unconditionally filters out system/developer messages — a function designed for the summarization use case, not truncation. This caused system prompts to be silently deleted when chat history was reduced. Fix: detect the first system/developer message before truncation, pass `has_system_message=True` to `locate_safe_reduction_index` so target_count accounts for the preserved message, use a simple slice instead of `extract_range`, and prepend the system message if it was truncated away. This matches the .NET SDK behavior (PR microsoft#10344, already merged). Also adds a guard for `target_count <= 0` after the system message adjustment to prevent IndexError when target_count=1. Note: The summarization reducer (`ChatHistorySummarizationReducer`) has the same bug — it also uses `extract_range` and loses system messages. That should be addressed in a follow-up PR. Closes microsoft#12612
df8d4a9 to
cd25e3a
Compare
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 85%
✗ Correctness
The diff correctly preserves system/developer messages during truncation by adjusting
target_countand re-prepending the system message when it gets sliced away. The approach is sound for the common case. However, there is a correctness issue in the edge case wheretarget_count=1with a system message:locate_safe_reduction_indexreturnsNone(meaning 'no reduction needed'), so the history is never reduced even when it contains many messages. The test for this case is lenient and acceptsNone, masking the bug. Additionally, theChatHistorySummarizationReduceris not updated with the same system-message preservation logic, which may lead to inconsistent behavior between the two reducers.
✓ Security Reliability
The diff preserves system/developer messages during chat history truncation, aligning with .NET SDK behavior. The changes are generally sound from a security perspective — no injection risks, unsafe deserialization, or secrets. The main reliability concerns are: (1) only the first system/developer message is preserved, silently dropping any subsequent ones; (2) when target_count=1 with a system message, reduction is silently disabled forever, allowing unbounded history growth; and (3) the identity-based membership check for re-prepending the system message could behave unexpectedly if message equality is value-based rather than identity-based.
✗ Test Coverage
The diff adds system/developer message preservation to the truncation reducer and includes several new tests. Overall coverage is good, but there is one test (
test_truncation_target_count_1_with_system_message) whose key assertion is unreachable dead code — theif result is not Noneguard means the test always passes vacuously. Additionally, there are no direct unit tests for the modifiedlocate_safe_reduction_indexutility with the newhas_system_messageparameter, and no tests cover edge cases like multiple system/developer messages or theChatHistorySummarizationReducerwhich also calls the utility.
Blocking Issues
- When
target_count=1andhas_system_message=True,locate_safe_reduction_indexreturnsNone(no reduction), leaving the entire history intact even when it far exceeds the target. The caller should still be able to reduce to just the system message in this scenario. The earlyreturn Nonewhentarget_count <= 0prevents a crash (the scan loop would IndexError athistory[total_count]), but the behavior is wrong — the reducer should produce[system_message]with 1 total message, not silently skip reduction. - test_truncation_target_count_1_with_system_message (lines 161-170): The assertion inside
if result is not Noneis dead code. With target_count=1 and a system message,locate_safe_reduction_indexdecrements target_count to 0 and returns None, soreduce()always returns None. The test gives a false sense of coverage. It should explicitly assertresult is Noneand, ideally, add a separate case that actually exercises the preservation path.
Suggestions
- The
ChatHistorySummarizationReduceralso callslocate_safe_reduction_indexbut is not updated to passhas_system_messageor to preserve system messages. Consider applying the same fix there for consistency. - The check
system_message not in truncated_listuses value equality (__eq__). IfChatMessageContenthas deep equality semantics (e.g., Pydantic models), a different message with identical fields could cause a false match. Consider using identity comparison (all(msg is not system_message for msg in truncated_list)) for robustness. - The test
test_truncation_target_count_1_with_system_messageis overly lenient — it acceptsNone(no reduction) as valid, which masks the bug. The test should assert that reduction occurs and the result contains only the system message. - Only the first system/developer message is found via
next(...). If a history legitimately contains multiple system or developer messages, all but the first will be silently dropped on truncation. Consider collecting all such messages or documenting this as a known limitation. - When
target_count=1and a system message exists,locate_safe_reduction_indexreturnsNoneunconditionally, meaning truncation never fires and history grows unboundedly. Consider logging a warning so users can detect this misconfiguration. - The check
system_message not in truncated_listrelies on the equality semantics of message objects. IfChatMessageContent.__eq__is value-based, a different message with identical content could prevent the system message from being re-prepended. Using identity comparison (all(msg is not system_message for msg in truncated_list)) would be more robust. - Add direct unit tests for
locate_safe_reduction_indexwithhas_system_message=True, covering the target_count adjustment and the target_count<=0 early return. - Add a test with multiple system/developer messages to verify only the first is preserved (current
next()semantics). - Add a test where the system message falls within the retained tail (i.e., system message is NOT truncated away) to exercise the
system_message not in truncated_listfalse branch. - Verify whether
ChatHistorySummarizationReduceralso needs updating to passhas_system_messagetolocate_safe_reduction_index, and add corresponding tests if so.
Automated review by moonbox3's agents
| The index that identifies the starting point for a reduced history that does not orphan | ||
| sensitive content. Returns None if reduction is not needed. | ||
| """ | ||
| # Adjust target_count to account for the system message that will be preserved separately. |
There was a problem hiding this comment.
When has_system_message=True and target_count was 1, this decrements to 0 and returns None (no reduction). But the history may have many messages that should be reduced — the reducer should produce a result containing only the system message. Returning None here causes the truncation reducer to silently skip reduction entirely. Consider handling target_count == 0 as 'truncate everything' (return len(history)) and let the caller prepend the system message, or handle this edge case in the truncation reducer itself.
| truncated_list = history[truncation_index:] | ||
|
|
||
| # Prepend the system/developer message if it was truncated away | ||
| if system_message is not None and system_message not in truncated_list: |
There was a problem hiding this comment.
This uses value equality (__eq__) via not in. Since system_message is a reference into the original history list and truncated_list is a slice of the same list, identity comparison would be more precise and avoid a potential false match if another message happens to have identical fields.
| if system_message is not None and system_message not in truncated_list: | |
| if system_message is not None and all(msg is not system_message for msg in truncated_list): |
|
|
||
| async def test_truncation_target_count_1_with_system_message(): | ||
| """Verify target_count=1 with system message does not crash (edge case from review).""" | ||
| reducer = ChatHistoryTruncationReducer(target_count=1, system_message="System prompt") | ||
| reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="Hello")) | ||
| reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Hi")) | ||
| reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="How are you?")) | ||
| reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Good")) | ||
|
|
||
| # Should not crash. Either returns None (no reduction possible) | ||
| # or returns just the system message. |
There was a problem hiding this comment.
This test accepts None (no reduction) as a valid outcome for target_count=1 with a system message, but that behavior is a bug — the user asked for at most 1 message and the history has 5. The test should assert that reduction happens and the result contains exactly the system message. As written, this test will pass even when the reducer fails to reduce.
| The index that identifies the starting point for a reduced history that does not orphan | ||
| sensitive content. Returns None if reduction is not needed. | ||
| """ | ||
| # Adjust target_count to account for the system message that will be preserved separately. |
There was a problem hiding this comment.
When target_count is 1 and has_system_message is True, this permanently prevents any reduction. Consider emitting a warning log so callers can detect this silent no-op, since an ever-growing history could eventually cause OOM or token-limit failures.
| # Adjust target_count to account for the system message that will be preserved separately. | |
| target_count -= 1 | |
| if target_count <= 0: | |
| import logging | |
| logging.getLogger(__name__).warning( | |
| "target_count after accounting for system message is %d; reduction disabled.", target_count | |
| ) | |
| return None # Cannot reduce further; only system message would remain |
| truncated_list = extract_range(history, start=truncation_index) | ||
| truncated_list = history[truncation_index:] | ||
|
|
||
| # Prepend the system/developer message if it was truncated away |
There was a problem hiding this comment.
Using not in performs value-based equality checks. If ChatMessageContent.__eq__ compares by content, a duplicate message elsewhere in the list could shadow the original system message and prevent it from being re-prepended. An identity check would be safer here.
| # Prepend the system/developer message if it was truncated away | |
| if system_message is not None and not any(m is system_message for m in truncated_list): |
| reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="How are you?")) | ||
| reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Good")) | ||
|
|
||
| # Should not crash. Either returns None (no reduction possible) | ||
| # or returns just the system message. | ||
| result = await reducer.reduce() | ||
| if result is not None: | ||
| # System message must be preserved | ||
| assert any(m.role == AuthorRole.SYSTEM for m in result.messages) | ||
|
|
There was a problem hiding this comment.
Dead assertion: with target_count=1 and a system message, locate_safe_reduction_index reduces target to 0 and returns None, so reduce() returns None. The if result is not None branch never executes, meaning the system-message assertion is never checked. Assert the expected behavior explicitly.
| reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="How are you?")) | |
| reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Good")) | |
| # Should not crash. Either returns None (no reduction possible) | |
| # or returns just the system message. | |
| result = await reducer.reduce() | |
| if result is not None: | |
| # System message must be preserved | |
| assert any(m.role == AuthorRole.SYSTEM for m in result.messages) | |
| # target_count=1 with system message means adjusted target becomes 0, | |
| # so no reduction is possible. | |
| result = await reducer.reduce() | |
| assert result is None |
1. target_count=1 with system message: return len(history) instead of None so the reducer produces [system_message] instead of silently skipping reduction (unbounded history growth). 2. Dead test assertion: replace vacuously-passing `if result is not None` guard with explicit assertions on result count, role, and content. 3. Identity comparison: use `all(msg is not system_message ...)` instead of `not in` to avoid false matches from Pydantic value-based equality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the thorough review — all three blocking issues were valid. Pushed fixes: 1.
Note: this actually improves on the .NET SDK's handling of the same edge case — the .NET implementation has a latent issue where 2. Dead test assertion Replaced the vacuous assert result is not None
assert len(result.messages) == 1
assert result.messages[0].role == AuthorRole.SYSTEM
assert result.messages[0].content == "System prompt"3. Identity comparison Switched from Regarding the non-blocking suggestions:
|
Summary
ChatHistoryTruncationReducer.reduce()silently deletes system/developer messages when truncating chat history. This is because it callsextract_range(), which unconditionally filters out system/developer messages — a function designed for the summarization use case, not truncation.Fixes #12612.
Approach
Port the .NET SDK fix (PR #10344) to Python:
has_system_message=Truetolocate_safe_reduction_indexsotarget_countaccounts for the preserved message (matches .NET'stargetCount -= hasSystemMessage ? 1 : 0)history[truncation_index:]slice instead ofextract_range(which strips system messages)Also adds a guard for
target_count <= 0after the system message adjustment to preventIndexErrorwhentarget_count=1.Changes
chat_history_reducer_utils.pyhas_system_messageparameter, adjusttarget_count, guard against<= 0chat_history_truncation_reducer.pyextract_range, prepend if truncatedtest_chat_history_truncation_reducer.pyNote
The summarization reducer (
ChatHistorySummarizationReducer) has the same bug — it also usesextract_rangeand loses system messages during summarization. The .NET summarization reducer preserves system messages viaAssemblySummarizedHistory. This should be addressed in a follow-up PR to keep this change focused.Test plan
target_count=1with system message does not crash