RFC: explore sending content parts to LiteLLM instead of always sending strings to be deserialized for custom types #8280

dimroc · 2025-05-26T16:11:44Z

TL:DR Remove CUSTOM_TYPE_START_IDENTIFIER serialization and instead pass through array of content parts to avoid serialization bug.

For custom types like dspy.Image, we unfortunately serialize to them deserialize:

class BaseType(pydantic.BaseModel):
    """Base class to support creating custom types for DSPy signatures.

    This is the parent class of DSPy custom types, e.g, dspy.Image. Subclasses must implement the `format` method to
    return a list of dictionaries (same as the Array of content parts in the OpenAI API user message's content field).
   """
    @pydantic.model_serializer()
    def serialize_model(self):
        return f"{CUSTOM_TYPE_START_IDENTIFIER}{self.format()}{CUSTOM_TYPE_END_IDENTIFIER}" # <--problem

Let's skip this and just pass the parts ({type: text, text: "string"}) to LiteLLM.

Issue: Text deserialization breaks on some escaped quotes in custom types.

When sending strings with lots of escaped quote, the ChatAdapter deserialize step can break. See below.

This happens because we try to deserialize for custom types:

dspy/dspy/adapters/types/base_type.py

Lines 68 to 86 in 6270e95

    
           pattern = rf"{CUSTOM_TYPE_START_IDENTIFIER}(.*?){CUSTOM_TYPE_END_IDENTIFIER}" 
        
           result = [] 
        
           last_end = 0 
        
           # DSPy adapter always formats user input into a string content before custom type splitting 
        
           content: str = message["content"] 
        
           for match in re.finditer(pattern, content, re.DOTALL): 
        
               start, end = match.span() 
        
               # Add text before the current block 
        
               if start > last_end: 
        
                   result.append({"type": "text", "text": content[last_end:start]}) 
        
               # Parse the JSON inside the block 
        
               custom_type_content = match.group(1).strip() 
        
               try: 
        
                   parsed = json_repair.loads(custom_type_content) 
        
                   for custom_type_content in parsed: 
        
                       result.append(custom_type_content)

As shown above, we serialize and then deserialize content parts to support custom types like dspy.Image. It would be nice if there was more direct support for content parts for DSPy types.

Request for comment (RFC): Passing Through a Custom Type's `format()` parts

Rather than serialize and deserialize, let's allow the parts to pass all the way through to LiteLLM. This would be more efficient, skip serialization issues, and allow more powerful types.

Implementation

Instead of the adapter always returning a string, return the list of parts

        return "\n\n".join(output).strip() # <-- to be replaced

            if k in inputs:
                value = inputs.get(k)
                normalized_value = format_field_value(value)
                messages.extend(normalized_value)

Wrap raw string values in a part.

def format_field_value(value) -> list[dict]:
    if isinstance(value, str):
        return [{"type": "text", "text": value}] # <-- turn string into part
    elif isinstance(value, list):
        formatted_list = [format_field_value(v) for v in value]
        flattened = list(itertools.chain.from_iterable(formatted_list))
        return flattened
    elif isinstance(value, BaseType) or hasattr(
        value, "format"
    ):  # Check if Custom Type
        return value.format()  # WARN: assumes a list. Dangerous.
    else:
        return value

This is a rough implementation to help drive the conversation. I'm not sure I understand the implications in other adapters or parts of the system. If we like this direction, I can clean it up.

RFC: explore sending LiteLLM parts instead of serialize/deserialze dance

27ab178

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: explore sending content parts to LiteLLM instead of always sending strings to be deserialized for custom types #8280

RFC: explore sending content parts to LiteLLM instead of always sending strings to be deserialized for custom types #8280

dimroc commented May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

	pattern = rf"{CUSTOM_TYPE_START_IDENTIFIER}(.*?){CUSTOM_TYPE_END_IDENTIFIER}"
	result = []
	last_end = 0
	# DSPy adapter always formats user input into a string content before custom type splitting
	content: str = message["content"]

	for match in re.finditer(pattern, content, re.DOTALL):
	start, end = match.span()

	# Add text before the current block
	if start > last_end:
	result.append({"type": "text", "text": content[last_end:start]})

	# Parse the JSON inside the block
	custom_type_content = match.group(1).strip()
	try:
	parsed = json_repair.loads(custom_type_content)
	for custom_type_content in parsed:
	result.append(custom_type_content)

RFC: explore sending content parts to LiteLLM instead of always sending strings to be deserialized for custom types #8280

Are you sure you want to change the base?

RFC: explore sending content parts to LiteLLM instead of always sending strings to be deserialized for custom types #8280

Conversation

dimroc commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue: Text deserialization breaks on some escaped quotes in custom types.

Request for comment (RFC): Passing Through a Custom Type's format() parts

Implementation

Uh oh!

Uh oh!

dimroc commented May 26, 2025 •

edited

Loading

Request for comment (RFC): Passing Through a Custom Type's `format()` parts