Skip to content

fix: handle Pydantic MockValSer bug in streaming responses (#18801)#24298

Open
AudreyKj wants to merge 1 commit intoBerriAI:mainfrom
AudreyKj:fix/streaming-response-bug
Open

fix: handle Pydantic MockValSer bug in streaming responses (#18801)#24298
AudreyKj wants to merge 1 commit intoBerriAI:mainfrom
AudreyKj:fix/streaming-response-bug

Conversation

@AudreyKj
Copy link

Problem

TypeError: 'MockValSer' object cannot be converted to 'SchemaSerializer' when handling streaming responses with SAP AI Core and other providers (vLLM, etc).

Related GitHub Issue: #18801
Related Pydantic Issue: pydantic/pydantic#7713

Root Cause

Pydantic 2.11+ has a bug where the internal MockValSer sentinel is not properly converted to a real SchemaSerializer in certain streaming scenarios. When LiteLLM tries to serialize streaming chunks using model_dump(), it hits this corrupted serializer state and crashes.

The bug occurs when:

  1. A chunk is created from a dictionary and properly serialized
  2. LiteLLM modifies the chunk (e.g., stripping usage data)
  3. A new chunk is reconstructed from the modified dictionary
  4. Pydantic fails to fully initialize the serializer on the new object
  5. Subsequent model_dump() calls crash with MockValSer TypeError

Solution

Added try-catch fallback that uses __dict__ extraction when model_dump() fails with TypeError. This bypasses Pydantic's broken serialization entirely while maintaining all functionality.

The fix is:

  • Minimal: Only activates when the bug occurs
  • Backward compatible: Normal path still uses Pydantic serialization
  • Robust: Tested with complex nested objects and streaming scenarios
  • Low overhead: Exception handling only triggered when bug occurs

Changes

Modified Files

  1. litellm/litellm_core_utils/streaming_handler.py

    • Added fallback in 2 locations where model_dump() is called on streaming chunks
    • Location 1 (~line 1859): When stripping usage from response chunks
    • Location 2 (~line 2047): When stripping usage from processed chunks
  2. litellm/litellm_core_utils/core_helpers.py

    • Added fallback in preserve_upstream_non_openai_attributes() function (~line 273)
    • Ensures non-OpenAI attributes are preserved even when serializer is corrupted
  3. tests/test_litellm/litellm_core_utils/test_streaming_handler.py

    • Added regression test test_model_dump_fallback_handles_pydantic_serializer_bug
    • Simulates the MockValSer bug and verifies fallback behavior

Testing

All 49 streaming handler tests pass

python3 -m pytest tests/test_litellm/litellm_core_utils/test_streaming_handler.py -v
# 49 passed, 70 warnings in 3.65s

Regression test verifies fallback behavior

  • Mocks model_dump() to raise MockValSer TypeError
  • Confirms fallback to __dict__ extraction works correctly
  • Validates that usage stripping still functions properly

Why Not Alternative Solutions?

  • Downgrade Pydantic: Creates dependency conflicts with LiteLLM 1.82.4+ which requires Pydantic 2.11+
  • Downgrade LiteLLM: Older versions don't support SAP AI Core provider
  • mode='python': Still uses the broken serializer internally
  • Wait for Pydantic fix: Issue refactor: replace regex with string method for whitespace check… #7713 has been open since Oct 2023 with no timeline
  • __dict__ fallback: Bypasses serialization entirely, works immediately

Impact

This fix resolves streaming issues for:

  • SAP AI Core provider
  • vLLM provider
  • Any other provider that reconstructs chunks during streaming

Users experiencing the MockValSer error will now have streaming work correctly without any configuration changes.

Checklist

…8801)

## Problem
TypeError: 'MockValSer' object cannot be converted to 'SchemaSerializer'
when handling streaming responses with SAP AI Core and other providers.

Pydantic 2.11+ has a bug where the internal MockValSer sentinel is not
properly converted to a real SchemaSerializer in certain streaming
scenarios. When LiteLLM tries to serialize chunks using model_dump(),
it hits this corrupted serializer state.

## Solution
Added try-catch fallback that uses __dict__ extraction when model_dump()
fails with TypeError. This bypasses Pydantic's serialization entirely
while maintaining functionality.

## Changes
- litellm/litellm_core_utils/streaming_handler.py: Added fallback in 2 locations
- litellm/litellm_core_utils/core_helpers.py: Added fallback in preserve_upstream_non_openai_attributes
- tests/test_litellm/litellm_core_utils/test_streaming_handler.py: Added regression test

## Testing
✅ All 49 streaming handler tests pass
✅ Regression test verifies fallback behavior

Related: BerriAI#18801
Related: pydantic/pydantic#7713
@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 21, 2026 3:47pm

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing AudreyKj:fix/streaming-response-bug (34fb901) with main (d8e4fc4)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR adds try/except TypeError fallbacks around three model_dump() call sites to work around a Pydantic 2.11+ bug where an internal MockValSer sentinel is not properly promoted to a real SchemaSerializer during streaming chunk reconstruction. When the bug occurs, the code falls back to dict(obj.__dict__) to extract field data.

Key changes:

  • streaming_handler.py (×2): Fallback in __next__ and __anext__ when stripping usage before returning a chunk to the caller
  • core_helpers.py (×1): Fallback in preserve_upstream_non_openai_attributes when copying non-OpenAI fields to the response
  • test_streaming_handler.py: New regression test that mocks model_dump() to raise TypeError and verifies the fallback path in return_processed_chunk_logic

Issues found:

  • The except TypeError guard is broader than necessary — it catches every TypeError from model_dump(), not only the specific MockValSer message. A real type error (e.g., from a custom serializer or a programming mistake) will silently redirect to __dict__ extraction, potentially returning structurally different data since model_dump() recursively serializes nested Pydantic objects while __dict__ returns them as raw model instances.
  • The exception variable e is captured but never used or logged in both streaming_handler.py locations, discarding diagnostic information.
  • The regression test only exercises the core_helpers.py fallback; the two fallback paths inside __next__ / __anext__ in streaming_handler.py are not covered by any test.
  • The test stores original_model_dump for a teardown that was never implemented, leaving the mock permanently on the chunk object for the duration of the test.

Confidence Score: 3/5

  • The fix addresses a real production bug but the overly broad TypeError catch and incomplete test coverage introduce new risks that should be addressed before merging.
  • The approach is pragmatic and the happy path is unchanged, but catching all TypeErrors without re-raising non-MockValSer ones risks silently masking unrelated bugs; the dict fallback is also not semantically equivalent to model_dump() for nested Pydantic objects. The test doesn't cover the two main code paths changed in streaming_handler.py.
  • litellm/litellm_core_utils/streaming_handler.py and litellm/litellm_core_utils/core_helpers.py — both need the TypeError guard narrowed to the specific MockValSer message.

Important Files Changed

Filename Overview
litellm/litellm_core_utils/streaming_handler.py Added try/except TypeError fallback around two model_dump() calls (lines 1862 and 2054) when stripping usage from streaming chunks; the broad catch may swallow unrelated TypeErrors and the __dict__ output is not structurally equivalent to model_dump() output.
litellm/litellm_core_utils/core_helpers.py Added identical try/except TypeError fallback in preserve_upstream_non_openai_attributes; same broad-catch concern applies — any TypeError from model_dump() silently redirects to __dict__ extraction.
tests/test_litellm/litellm_core_utils/test_streaming_handler.py New regression test validates the core_helpers fallback but does not exercise the two fallback paths inside __next__/__anext__ in streaming_handler.py; also contains an unused original_model_dump variable.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant CustomStreamWrapper
    participant Pydantic

    Caller->>CustomStreamWrapper: __next__() / __anext__()
    CustomStreamWrapper->>Pydantic: response.model_dump()
    alt Normal path (Pydantic ≤ 2.10 or no bug)
        Pydantic-->>CustomStreamWrapper: obj_dict (fully serialized)
    else MockValSer bug (Pydantic 2.11+)
        Pydantic-->>CustomStreamWrapper: raises TypeError
        CustomStreamWrapper->>CustomStreamWrapper: obj_dict = dict(response.__dict__)
    end
    CustomStreamWrapper->>CustomStreamWrapper: del obj_dict["usage"]
    CustomStreamWrapper->>CustomStreamWrapper: model_response_creator(chunk=obj_dict)
    CustomStreamWrapper-->>Caller: processed chunk (no usage)

    Caller->>CustomStreamWrapper: return_processed_chunk_logic(...)
    CustomStreamWrapper->>Pydantic: original_chunk.model_dump() [in preserve_upstream_non_openai_attributes]
    alt Normal path
        Pydantic-->>CustomStreamWrapper: obj_dict
    else MockValSer bug
        Pydantic-->>CustomStreamWrapper: raises TypeError
        CustomStreamWrapper->>CustomStreamWrapper: obj_dict = dict(original_chunk.__dict__)
    end
    CustomStreamWrapper->>CustomStreamWrapper: setattr non-OpenAI fields onto model_response
    CustomStreamWrapper-->>Caller: model_response
Loading

Last reviewed commit: "fix: handle Pydantic..."

Comment on lines +1864 to +1866
except TypeError as e:
# Fallback: manually extract dict from __dict__ to bypass Pydantic serializer
obj_dict = dict(response.__dict__) if hasattr(response, '__dict__') else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Overly broad TypeError catch swallows unrelated errors

The fallback catches every TypeError, not just the MockValSer one. If model_dump() raises a TypeError for a different reason (e.g., a genuine type mismatch in a custom serializer or a programming mistake), the code will silently fall back to __dict__, potentially returning subtly wrong/incomplete data instead of surfacing the real bug.

Additionally, the __dict__ of a Pydantic v2 model and the output of model_dump() are not equivalent: model_dump() recursively serializes nested models to plain dicts/primitives, while __dict__ returns the raw Python objects (nested Pydantic model instances, enums, etc.). Passing this mixed-type dict to model_response_creator could produce unexpected results depending on how the creator handles nested objects.

Consider narrowing the guard to only the known error string:

Suggested change
except TypeError as e:
# Fallback: manually extract dict from __dict__ to bypass Pydantic serializer
obj_dict = dict(response.__dict__) if hasattr(response, '__dict__') else {}
try:
obj_dict = response.model_dump()
except TypeError as e:
if "MockValSer" not in str(e):
raise
# Fallback: manually extract dict from __dict__ to bypass Pydantic serializer
obj_dict = dict(response.__dict__) if hasattr(response, '__dict__') else {}

Same pattern applies to the equivalent catch at line 2056 and in core_helpers.py.

Comment on lines +2056 to +2058
except TypeError as e:
# Fallback: manually extract dict from __dict__ to bypass Pydantic serializer
obj_dict = dict(processed_chunk.__dict__) if hasattr(processed_chunk, '__dict__') else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused exception variable e

The exception is captured as e but never used or logged. This silently discards the exception information, making future debugging harder if a different TypeError accidentally gets swallowed here.

The same issue exists at line 1864.

Suggested change
except TypeError as e:
# Fallback: manually extract dict from __dict__ to bypass Pydantic serializer
obj_dict = dict(processed_chunk.__dict__) if hasattr(processed_chunk, '__dict__') else {}
except TypeError:
# Fallback: manually extract dict from __dict__ to bypass Pydantic serializer
obj_dict = dict(processed_chunk.__dict__) if hasattr(processed_chunk, '__dict__') else {}

)

# Mock model_dump to raise TypeError (simulating MockValSer bug)
original_model_dump = chunk_with_usage.model_dump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead variable original_model_dump

original_model_dump is assigned but never referenced again. It was likely intended for restoring the method after the test (to avoid leaking the mock), but the cleanup was omitted.

If this is intentional, remove the assignment. If teardown was intended, add it:

try:
    result = initialized_custom_stream_wrapper.return_processed_chunk_logic(...)
finally:
    chunk_with_usage.model_dump = original_model_dump

Comment on lines +1721 to +1730
# Process the chunk through return_processed_chunk_logic which calls model_dump
result = initialized_custom_stream_wrapper.return_processed_chunk_logic(
completion_obj={"content": "test content"},
response_obj={"original_chunk": chunk_with_usage},
model_response=chunk_with_usage,
)

# Should not raise TypeError and should successfully process the chunk
assert result is not None
assert result.choices[0].delta.content == "test content"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test doesn't cover the streaming_handler.py fallback paths

The test exercises return_processed_chunk_logic, which invokes the fallback in core_helpers.preserve_upstream_non_openai_attributes. However, the two new except TypeError blocks added in streaming_handler.py (lines 1862–1866 and 2054–2058) live inside the synchronous __next__ and asynchronous __anext__ iterators respectively — neither of which is called by return_processed_chunk_logic.

As a result, the regression test does not actually verify the two most direct code paths changed by this PR. Consider adding a test that iterates the wrapper (e.g., via list(wrapper) or async for chunk in wrapper) with model_dump patched to raise, to confirm those paths also survive the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant