Skip to content

Add support for OpenAI tool calling API responses (NVIDIA NemoRetriever Parse) #2440

@pbrady

Description

@pbrady

Docling's api_image_request() fails when VLM APIs return responses via OpenAI's tool calling format (used by NVIDIA NemoRetriever Parse) because OpenAiChatMessage.content is required but tool calling responses don't include it.

Error:

pydantic_core.ValidationError: 1 validation error for OpenAiApiResponse
choices.0.message.content
  Field required [type=missing]

Tool calling response format:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "tool_calls": [{
        "function": {
          "name": "markdown_no_bbox",
          "arguments": "[{\"text\": \"Extracted text\"}]"
        }
      }]
      // No "content" field
    }
  }]
}

Current Workaround (Monkey Patch)

We have to replace the entire api_image_request() function to handle tool_calls:

def nvidia_parse_compatible_api_request(image, prompt, url, timeout=20, headers=None, **params):
    """Replacement that handles both content and tool_calls responses."""
    # ... standard request code ...

    r = requests.post(str(url), headers=headers, json=payload, timeout=timeout)
    r.raise_for_status()

    response_data = json.loads(r.text)
    message = response_data['choices'][0]['message']

    # Handle tool_calls (the key difference)
    if 'tool_calls' in message and message['tool_calls']:
        arguments = json.loads(message['tool_calls'][0]['function']['arguments'])
        # Extract text from tool response...
        return extracted_text
    elif 'content' in message:
        return message['content'].strip()
    else:
        raise ValueError("Response has neither content nor tool_calls")

# Must patch before importing Docling
import docling.utils.api_image_request as api_module
api_module.api_image_request = nvidia_parse_compatible_api_request

I think this could be fixed with minimal changes (untested) by making content optional and adding tool_calls extraction similar to our monkey patching:

File: docling/datamodel/base_models.py (line 335)

class OpenAiChatMessage(BaseModel):
    role: str
    content: Optional[str] = None      # Make optional
    tool_calls: Optional[List[dict]] = None  # Add tool_calls field

Are there any downsides to this? If not, I can submit a PR.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions