Skip to content

Python: Fix ChatHistoryTruncationReducer deleting system prompt#13610

Open
roli-lpci wants to merge 2 commits intomicrosoft:mainfrom
roli-lpci:fix/python-history-reducer-system-prompt-12612
Open

Python: Fix ChatHistoryTruncationReducer deleting system prompt#13610
roli-lpci wants to merge 2 commits intomicrosoft:mainfrom
roli-lpci:fix/python-history-reducer-system-prompt-12612

Conversation

@roli-lpci
Copy link

Summary

ChatHistoryTruncationReducer.reduce() silently deletes system/developer messages when truncating chat history. This is because it calls extract_range(), which unconditionally filters out system/developer messages — a function designed for the summarization use case, not truncation.

Fixes #12612.

Approach

Port the .NET SDK fix (PR #10344) to Python:

  1. Detect the first system/developer message before truncation
  2. Pass has_system_message=True to locate_safe_reduction_index so target_count accounts for the preserved message (matches .NET's targetCount -= hasSystemMessage ? 1 : 0)
  3. Use a simple history[truncation_index:] slice instead of extract_range (which strips system messages)
  4. Prepend the system message if it was truncated away

Also adds a guard for target_count <= 0 after the system message adjustment to prevent IndexError when target_count=1.

Changes

File Change
chat_history_reducer_utils.py Add has_system_message parameter, adjust target_count, guard against <= 0
chat_history_truncation_reducer.py Detect system message, bypass extract_range, prepend if truncated
test_chat_history_truncation_reducer.py Update 2 existing tests + 4 new tests (system, developer, no-system, target_count=1)

Note

The summarization reducer (ChatHistorySummarizationReducer) has the same bug — it also uses extract_range and loses system messages during summarization. The .NET summarization reducer preserves system messages via AssemblySummarizedHistory. This should be addressed in a follow-up PR to keep this change focused.

Test plan

@roli-lpci roli-lpci requested a review from a team as a code owner March 1, 2026 09:04
@moonbox3 moonbox3 added the python Pull requests for the Python Semantic Kernel label Mar 1, 2026
…osoft#12612)

The truncation reducer used `extract_range()` which unconditionally
filters out system/developer messages — a function designed for the
summarization use case, not truncation. This caused system prompts to
be silently deleted when chat history was reduced.

Fix: detect the first system/developer message before truncation,
pass `has_system_message=True` to `locate_safe_reduction_index` so
target_count accounts for the preserved message, use a simple slice
instead of `extract_range`, and prepend the system message if it was
truncated away.

This matches the .NET SDK behavior (PR microsoft#10344, already merged).

Also adds a guard for `target_count <= 0` after the system message
adjustment to prevent IndexError when target_count=1.

Note: The summarization reducer (`ChatHistorySummarizationReducer`)
has the same bug — it also uses `extract_range` and loses system
messages. That should be addressed in a follow-up PR.

Closes microsoft#12612
@roli-lpci roli-lpci force-pushed the fix/python-history-reducer-system-prompt-12612 branch from df8d4a9 to cd25e3a Compare March 3, 2026 06:44
Copy link
Collaborator

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 85%

✗ Correctness

The diff correctly preserves system/developer messages during truncation by adjusting target_count and re-prepending the system message when it gets sliced away. The approach is sound for the common case. However, there is a correctness issue in the edge case where target_count=1 with a system message: locate_safe_reduction_index returns None (meaning 'no reduction needed'), so the history is never reduced even when it contains many messages. The test for this case is lenient and accepts None, masking the bug. Additionally, the ChatHistorySummarizationReducer is not updated with the same system-message preservation logic, which may lead to inconsistent behavior between the two reducers.

✓ Security Reliability

The diff preserves system/developer messages during chat history truncation, aligning with .NET SDK behavior. The changes are generally sound from a security perspective — no injection risks, unsafe deserialization, or secrets. The main reliability concerns are: (1) only the first system/developer message is preserved, silently dropping any subsequent ones; (2) when target_count=1 with a system message, reduction is silently disabled forever, allowing unbounded history growth; and (3) the identity-based membership check for re-prepending the system message could behave unexpectedly if message equality is value-based rather than identity-based.

✗ Test Coverage

The diff adds system/developer message preservation to the truncation reducer and includes several new tests. Overall coverage is good, but there is one test (test_truncation_target_count_1_with_system_message) whose key assertion is unreachable dead code — the if result is not None guard means the test always passes vacuously. Additionally, there are no direct unit tests for the modified locate_safe_reduction_index utility with the new has_system_message parameter, and no tests cover edge cases like multiple system/developer messages or the ChatHistorySummarizationReducer which also calls the utility.

Blocking Issues

  • When target_count=1 and has_system_message=True, locate_safe_reduction_index returns None (no reduction), leaving the entire history intact even when it far exceeds the target. The caller should still be able to reduce to just the system message in this scenario. The early return None when target_count <= 0 prevents a crash (the scan loop would IndexError at history[total_count]), but the behavior is wrong — the reducer should produce [system_message] with 1 total message, not silently skip reduction.
  • test_truncation_target_count_1_with_system_message (lines 161-170): The assertion inside if result is not None is dead code. With target_count=1 and a system message, locate_safe_reduction_index decrements target_count to 0 and returns None, so reduce() always returns None. The test gives a false sense of coverage. It should explicitly assert result is None and, ideally, add a separate case that actually exercises the preservation path.

Suggestions

  • The ChatHistorySummarizationReducer also calls locate_safe_reduction_index but is not updated to pass has_system_message or to preserve system messages. Consider applying the same fix there for consistency.
  • The check system_message not in truncated_list uses value equality (__eq__). If ChatMessageContent has deep equality semantics (e.g., Pydantic models), a different message with identical fields could cause a false match. Consider using identity comparison (all(msg is not system_message for msg in truncated_list)) for robustness.
  • The test test_truncation_target_count_1_with_system_message is overly lenient — it accepts None (no reduction) as valid, which masks the bug. The test should assert that reduction occurs and the result contains only the system message.
  • Only the first system/developer message is found via next(...). If a history legitimately contains multiple system or developer messages, all but the first will be silently dropped on truncation. Consider collecting all such messages or documenting this as a known limitation.
  • When target_count=1 and a system message exists, locate_safe_reduction_index returns None unconditionally, meaning truncation never fires and history grows unboundedly. Consider logging a warning so users can detect this misconfiguration.
  • The check system_message not in truncated_list relies on the equality semantics of message objects. If ChatMessageContent.__eq__ is value-based, a different message with identical content could prevent the system message from being re-prepended. Using identity comparison (all(msg is not system_message for msg in truncated_list)) would be more robust.
  • Add direct unit tests for locate_safe_reduction_index with has_system_message=True, covering the target_count adjustment and the target_count<=0 early return.
  • Add a test with multiple system/developer messages to verify only the first is preserved (current next() semantics).
  • Add a test where the system message falls within the retained tail (i.e., system message is NOT truncated away) to exercise the system_message not in truncated_list false branch.
  • Verify whether ChatHistorySummarizationReducer also needs updating to pass has_system_message to locate_safe_reduction_index, and add corresponding tests if so.

Automated review by moonbox3's agents

Comment on lines 92 to +95
The index that identifies the starting point for a reduced history that does not orphan
sensitive content. Returns None if reduction is not needed.
"""
# Adjust target_count to account for the system message that will be preserved separately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When has_system_message=True and target_count was 1, this decrements to 0 and returns None (no reduction). But the history may have many messages that should be reduced — the reducer should produce a result containing only the system message. Returning None here causes the truncation reducer to silently skip reduction entirely. Consider handling target_count == 0 as 'truncate everything' (return len(history)) and let the caller prepend the system message, or handle this edge case in the truncation reducer itself.

truncated_list = history[truncation_index:]

# Prepend the system/developer message if it was truncated away
if system_message is not None and system_message not in truncated_list:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses value equality (__eq__) via not in. Since system_message is a reference into the original history list and truncated_list is a slice of the same list, identity comparison would be more precise and avoid a potential false match if another message happens to have identical fields.

Suggested change
if system_message is not None and system_message not in truncated_list:
if system_message is not None and all(msg is not system_message for msg in truncated_list):

Comment on lines +155 to +165

async def test_truncation_target_count_1_with_system_message():
"""Verify target_count=1 with system message does not crash (edge case from review)."""
reducer = ChatHistoryTruncationReducer(target_count=1, system_message="System prompt")
reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="Hello"))
reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Hi"))
reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="How are you?"))
reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Good"))

# Should not crash. Either returns None (no reduction possible)
# or returns just the system message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test accepts None (no reduction) as a valid outcome for target_count=1 with a system message, but that behavior is a bug — the user asked for at most 1 message and the history has 5. The test should assert that reduction happens and the result contains exactly the system message. As written, this test will pass even when the reducer fails to reduce.

The index that identifies the starting point for a reduced history that does not orphan
sensitive content. Returns None if reduction is not needed.
"""
# Adjust target_count to account for the system message that will be preserved separately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When target_count is 1 and has_system_message is True, this permanently prevents any reduction. Consider emitting a warning log so callers can detect this silent no-op, since an ever-growing history could eventually cause OOM or token-limit failures.

Suggested change
# Adjust target_count to account for the system message that will be preserved separately.
target_count -= 1
if target_count <= 0:
import logging
logging.getLogger(__name__).warning(
"target_count after accounting for system message is %d; reduction disabled.", target_count
)
return None # Cannot reduce further; only system message would remain

truncated_list = extract_range(history, start=truncation_index)
truncated_list = history[truncation_index:]

# Prepend the system/developer message if it was truncated away
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using not in performs value-based equality checks. If ChatMessageContent.__eq__ compares by content, a duplicate message elsewhere in the list could shadow the original system message and prevent it from being re-prepended. An identity check would be safer here.

Suggested change
# Prepend the system/developer message if it was truncated away
if system_message is not None and not any(m is system_message for m in truncated_list):

Comment on lines +161 to +170
reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="How are you?"))
reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Good"))

# Should not crash. Either returns None (no reduction possible)
# or returns just the system message.
result = await reducer.reduce()
if result is not None:
# System message must be preserved
assert any(m.role == AuthorRole.SYSTEM for m in result.messages)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead assertion: with target_count=1 and a system message, locate_safe_reduction_index reduces target to 0 and returns None, so reduce() returns None. The if result is not None branch never executes, meaning the system-message assertion is never checked. Assert the expected behavior explicitly.

Suggested change
reducer.add_message(ChatMessageContent(role=AuthorRole.USER, content="How are you?"))
reducer.add_message(ChatMessageContent(role=AuthorRole.ASSISTANT, content="Good"))
# Should not crash. Either returns None (no reduction possible)
# or returns just the system message.
result = await reducer.reduce()
if result is not None:
# System message must be preserved
assert any(m.role == AuthorRole.SYSTEM for m in result.messages)
# target_count=1 with system message means adjusted target becomes 0,
# so no reduction is possible.
result = await reducer.reduce()
assert result is None

1. target_count=1 with system message: return len(history) instead of
   None so the reducer produces [system_message] instead of silently
   skipping reduction (unbounded history growth).

2. Dead test assertion: replace vacuously-passing `if result is not None`
   guard with explicit assertions on result count, role, and content.

3. Identity comparison: use `all(msg is not system_message ...)` instead
   of `not in` to avoid false matches from Pydantic value-based equality.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roli-lpci
Copy link
Author

Thanks for the thorough review — all three blocking issues were valid. Pushed fixes:

1. target_count=1 with system message (unbounded growth)

locate_safe_reduction_index now returns len(history) instead of None when target_count <= 0 after the system message adjustment. This produces an empty tail (history[len(history):] = []), and the caller prepends the system message, resulting in [system_message]. The reducer now correctly reduces to just the system message instead of silently skipping reduction.

Note: this actually improves on the .NET SDK's handling of the same edge case — the .NET implementation has a latent issue where target_count=1 with a system message can produce unexpected behavior.

2. Dead test assertion

Replaced the vacuous if result is not None guard with explicit assertions:

assert result is not None
assert len(result.messages) == 1
assert result.messages[0].role == AuthorRole.SYSTEM
assert result.messages[0].content == "System prompt"

3. Identity comparison

Switched from system_message not in truncated_list (value equality) to all(msg is not system_message for msg in truncated_list) (identity) to avoid false matches from Pydantic's value-based __eq__.


Regarding the non-blocking suggestions:

  • ChatHistorySummarizationReducer consistency: Agreed this should be updated too. Happy to add it in this PR or as a follow-up — let me know your preference.
  • Multiple system/developer messages: The current next() semantics preserve only the first. This matches the .NET SDK behavior. Worth documenting as a known limitation.
  • Direct unit tests for locate_safe_reduction_index: Can add these if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests for the Python Semantic Kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: Bug: Python 1.34.0 ChatHistoryTruncationReducer deletes the System Prompt

2 participants