feat: add Ollama native API support for vision requests by yyfl2zju · Pull Request #49 · zai-org/GLM-OCR

yyfl2zju · 2026-02-04T14:48:19Z

Add api_mode config option to support both OpenAI and Ollama formats
Implement _convert_to_ollama_generate() to transform requests
Update connect() method to test appropriate endpoint based on mode
Add model field injection for Ollama/MLX compatibility
Update README.md and README_zh.md with Ollama configuration guide
Include troubleshooting tips for 502 errors and layout dependencies

This change enables GLM-OCR to work with Ollama's /api/generate endpoint, which provides more stable vision support than the OpenAI-compatible API in some Ollama versions.

Copilot

Pull request overview

This PR adds support for Ollama's native /api/generate endpoint to enable more stable vision support compared to Ollama's OpenAI-compatible API in some versions. The implementation introduces a new api_mode configuration option that allows users to choose between OpenAI-compatible format and Ollama's native format, with automatic request/response conversion.

Changes:

Added api_mode configuration field to support "openai" (default) and "ollama_generate" modes
Implemented _convert_to_ollama_generate() method to transform OpenAI format requests to Ollama format
Updated connect() method to test the appropriate endpoint based on the selected API mode
Added comprehensive documentation in both English and Chinese READMEs with configuration examples and troubleshooting tips

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
glmocr/config.py	Added api_mode and model fields to OCRApiConfig class
glmocr/config.yaml	Added api_mode configuration option with documentation
glmocr/ocr_client.py	Implemented Ollama format conversion, updated request/response handling for both API modes
README.md	Added Ollama deployment guide with configuration instructions and troubleshooting
README_zh.md	Added Chinese version of Ollama deployment guide

Comments suppressed due to low confidence (2)

glmocr/ocr_client.py:273

The model field is injected into the request_data twice. Lines 264-265 inject the model field if configured, and then lines 272-273 inject it again with a check for whether it already exists. This duplicate logic should be consolidated into a single injection point to avoid confusion and potential bugs.

            if self.model:
                request_data["model"] = self.model

        headers = {"Content-Type": "application/json", **self.extra_headers}
        if self.api_key:
            headers["Authorization"] = f"Bearer {self.api_key}"

        # Inject model if configured
        if self.model and "model" not in request_data:
            request_data["model"] = self.model

glmocr/config.py:39

The field 'model' is defined twice in this class. Line 35 defines it as 'Optional model name (required by Ollama/MLX)' and line 39 defines it again with a comment 'Model name included in API requests.' This creates ambiguity and the second definition will override the first. Remove one of the duplicate definitions.

    model: Optional[str] = None  # Optional model name (required by Ollama/MLX)
    api_key: Optional[str] = None

    # Model name included in API requests.
    model: Optional[str] = None

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T02:43:01Z

glmocr/ocr_client.py

+        # Extract prompt and images from the last user message
+        prompt = ""
+        images = []
+
+        for msg in messages:
+            if msg.get("role") == "user":
+                content = msg.get("content", "")
+
+                if isinstance(content, str):
+                    prompt = content
+                elif isinstance(content, list):
+                    for item in content:
+                        if item.get("type") == "text":
+                            prompt = item.get("text", "")
+                        elif item.get("type") == "image_url":
+                            # Extract base64 from data URI
+                            image_url = item.get("image_url", "")
+                            if isinstance(image_url, dict):
+                                image_url = image_url.get("url", "")
+
+                            # Parse data:image/...;base64,<data>
+                            if image_url.startswith("data:"):
+                                parts = image_url.split(",", 1)
+                                if len(parts) == 2:
+                                    images.append(parts[1])
+                            else:
+                                images.append(image_url)


There's an inconsistency in how messages are processed. The comment on line 385 states "Extract prompt and images from the last user message", but the code accumulates images from all user messages (line 409, 411) while only keeping the text from the last user message (line 394, 398). Consider either: 1) Processing only the last user message for both prompt and images, or 2) Updating the comment to reflect that images are accumulated from all user messages.

Copilot · 2026-02-05T02:43:01Z

glmocr/ocr_client.py

+
+                    # Parse response based on API mode
+                    if self.api_mode == "ollama_generate":
+                        output = result.get("response", "")


When api_mode is 'ollama_generate', the code expects the response to have a 'response' field. However, if the API returns an error or unexpected response format, result.get('response', '') will return an empty string, which might mask the actual error. Consider checking for error conditions in the response before extracting the 'response' field, similar to how the OpenAI format path handles errors.

Suggested change

output = result.get("response", "")

# Handle possible error or unexpected response format

if isinstance(result, dict) and "error" in result:

logger.error(

"OCR API returned error in ollama_generate mode: %s",

result.get("error"),

)

return {

"error": f"OCR API error: {result.get('error')}"

}, 500

if not isinstance(result, dict) or "response" not in result:

logger.error(

"OCR API returned unexpected response format in "

"ollama_generate mode: %s",

result,

)

return {

"error": "Unexpected OCR API response format",

"response": result,

}, 500

output = result["response"] or ""

Copilot · 2026-02-05T02:43:01Z

glmocr/ocr_client.py

+    def _convert_to_ollama_generate(self, request_data: Dict) -> Dict:
+        """Convert OpenAI chat format to Ollama generate format.
+
+        OpenAI format:
+        {
+            "messages": [
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": "..."},
+                        {"type": "image_url", "image_url": "data:image/...;base64,..."}
+                    ]
+                }
+            ],
+            "max_tokens": 100,
+            ...
+        }
+
+        Ollama generate format:
+        {
+            "model": "glm-ocr:latest",
+            "prompt": "...",
+            "images": ["base64_string"],
+            "stream": false,
+            "options": {
+                "num_predict": 100,
+                ...
+            }
+        }
+        """
+        messages = request_data.get("messages", [])
+
+        # Extract prompt and images from the last user message
+        prompt = ""
+        images = []
+
+        for msg in messages:
+            if msg.get("role") == "user":
+                content = msg.get("content", "")
+
+                if isinstance(content, str):
+                    prompt = content
+                elif isinstance(content, list):
+                    for item in content:
+                        if item.get("type") == "text":
+                            prompt = item.get("text", "")
+                        elif item.get("type") == "image_url":
+                            # Extract base64 from data URI
+                            image_url = item.get("image_url", "")
+                            if isinstance(image_url, dict):
+                                image_url = image_url.get("url", "")
+
+                            # Parse data:image/...;base64,<data>
+                            if image_url.startswith("data:"):
+                                parts = image_url.split(",", 1)
+                                if len(parts) == 2:
+                                    images.append(parts[1])
+                            else:
+                                images.append(image_url)
+
+        # Build Ollama generate request
+        ollama_request = {
+            "model": self.model or "glm-ocr:latest",
+            "prompt": prompt,
+            "stream": False,
+        }
+
+        if images:
+            ollama_request["images"] = images
+
+        # Map parameters to Ollama options
+        options = {}
+        if "max_tokens" in request_data:
+            options["num_predict"] = request_data["max_tokens"]
+        if "temperature" in request_data:
+            options["temperature"] = request_data["temperature"]
+        if "top_p" in request_data:
+            options["top_p"] = request_data["top_p"]
+        if "top_k" in request_data:
+            options["top_k"] = request_data["top_k"]
+        if "repetition_penalty" in request_data:
+            options["repeat_penalty"] = request_data["repetition_penalty"]
+
+        if options:
+            ollama_request["options"] = options
+
+        return ollama_request


The new Ollama native API support functionality (api_mode, _convert_to_ollama_generate method, and related request/response handling) lacks test coverage. Given the complexity of the format conversion logic and multiple code paths, consider adding unit tests to verify: 1) Correct conversion from OpenAI to Ollama format with various input structures, 2) Proper response parsing for both modes, 3) Model field injection behavior, 4) Edge cases like missing fields or malformed data.

Copilot · 2026-02-05T02:43:02Z

glmocr/ocr_client.py

+        for msg in messages:
+            if msg.get("role") == "user":
+                content = msg.get("content", "")
+
+                if isinstance(content, str):
+                    prompt = content


The _convert_to_ollama_generate method only processes messages with role='user' and ignores system or assistant messages. If the OpenAI-format request contains system messages or conversation history, this information will be lost in the conversion. Consider documenting this limitation or adding a warning log if non-user messages are present.

glmocr/ocr_client.py

- Add api_mode config option to support both OpenAI and Ollama formats - Implement _convert_to_ollama_generate() to transform requests - Update connect() method to test appropriate endpoint based on mode - Add model field injection for Ollama/MLX compatibility - Update README.md and README_zh.md with Ollama configuration guide - Include troubleshooting tips for 502 errors and layout dependencies This change enables GLM-OCR to work with Ollama's /api/generate endpoint, which provides more stable vision support than the OpenAI-compatible API in some Ollama versions.

yyfl2zju force-pushed the feat/ollama-native-api-support branch from 5650185 to c956842 Compare February 4, 2026 15:33

JaredforReal requested a review from Copilot February 5, 2026 02:38

Copilot started reviewing on behalf of JaredforReal February 5, 2026 02:38 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

yyfl2zju force-pushed the feat/ollama-native-api-support branch from f7e4e4c to 7ac7751 Compare February 8, 2026 02:08

yyfl2zju force-pushed the feat/ollama-native-api-support branch from 7ac7751 to c35c0d9 Compare February 8, 2026 02:26

Admin added 2 commits February 8, 2026 10:33

refactor: remove redundant self.model attribute

491581a

docs: add warning for non-user messages in Ollama mode

b932e06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Ollama native API support for vision requests#49

feat: add Ollama native API support for vision requests#49
yyfl2zju wants to merge 3 commits intozai-org:mainfrom
yyfl2zju:feat/ollama-native-api-support

yyfl2zju commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-                        output = result.get("response", "")
+                        # Handle possible error or unexpected response format
+                        if isinstance(result, dict) and "error" in result:
+                            logger.error(
+                                "OCR API returned error in ollama_generate mode: %s",
+                                result.get("error"),
+                            )
+                            return {
+                                "error": f"OCR API error: {result.get('error')}"
+                            }, 500
+                        if not isinstance(result, dict) or "response" not in result:
+                            logger.error(
+                                "OCR API returned unexpected response format in "
+                                "ollama_generate mode: %s",
+                                result,
+                            )
+                            return {
+                                "error": "Unexpected OCR API response format",
+                                "response": result,
+                            }, 500
+                        output = result["response"] or ""

Conversation

yyfl2zju commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant