feat: add Ollama native API support for vision requests

Admin · Admin · commit 7ac7751996e7 · 2026-02-08T09:48:59.000+08:00
- Add api_mode config option to support both OpenAI and Ollama formats
- Implement _convert_to_ollama_generate() to transform requests
- Update connect() method to test appropriate endpoint based on mode
- Add model field injection for Ollama/MLX compatibility
- Update README.md and README_zh.md with Ollama configuration guide
- Include troubleshooting tips for 502 errors and layout dependencies

This change enables GLM-OCR to work with Ollama's /api/generate endpoint,
which provides more stable vision support than the OpenAI-compatible API
in some Ollama versions.
diff --git a/README.md b/README.md
@@ -130,9 +130,12 @@ pipeline:
     api_port: 8080
 ```
 
-#### Option 3: Apple Silicon with mlx-vlm
+#### Option 3: Ollama/MLX
 
-See the **[MLX Detailed Deployment Guide](examples/mlx-deploy/README.md)** for full setup instructions, including environment isolation and troubleshooting.
+For specialized deployment scenarios, see the detailed guides:
+
+- **[Apple Silicon with mlx-vlm](examples/mlx-deploy/README.md)** - Optimized for Apple Silicon Macs
+- **[Ollama Deployment](examples/ollama-deploy/README.md)** - Simple local deployment with Ollama
 
 ### SDK Usage Guide
 
diff --git a/README_zh.md b/README_zh.md
@@ -133,6 +133,13 @@ pipeline:
     api_port: 8080
 ```
 
+#### 方式 3: 其他部署选项
+
+针对特定部署场景，请查看详细指南：
+
+- **[Apple Silicon 使用 mlx-vlm](examples/mlx-deploy/README.md)** - 针对 Apple Silicon Mac 优化
+- **[Ollama 部署](examples/ollama-deploy/README.md)** - 使用 Ollama 进行简单的本地部署
+
 ### SDK 使用指南
 
 #### CLI
diff --git a/examples/ollama-deploy/README.md b/examples/ollama-deploy/README.md
@@ -0,0 +1,191 @@
+# Ollama Deployment Guide for GLM-OCR
+
+This guide provides detailed instructions for deploying GLM-OCR using Ollama.
+
+## Overview
+
+Ollama provides a simple local deployment option for running GLM-OCR. However, due to limitations in Ollama's OpenAI-compatible API for vision requests, we recommend using Ollama's native `/api/generate` endpoint.
+
+## Installation
+
+### 1. Install Ollama
+
+Download and install Ollama from the official website:
+
+**macOS / Linux:**
+```bash
+curl -fsSL https://ollama.ai/install.sh | sh
+```
+
+**Windows:**
+Download the installer from https://ollama.ai/download
+
+### 2. Verify Installation
+
+```bash
+ollama --version
+```
+
+### 3. Pull the GLM-OCR Model
+
+```bash
+ollama pull glm-ocr:latest
+```
+
+This will download the GLM-OCR model.
+
+### 4. Start Ollama Service
+
+The Ollama service should start automatically after installation. If not:
+
+```bash
+ollama serve
+```
+
+The service will run on `http://localhost:11434` by default.
+
+## Configuration
+
+### SDK Configuration
+
+Create or update your `config.yaml`:
+
+```yaml
+pipeline:
+  maas:
+    enabled: false
+  
+  ocr_api:
+    api_host: localhost
+    api_port: 11434
+    api_path: /api/generate  # Use Ollama native endpoint
+    model: glm-ocr:latest    # Required: specify model name
+    api_mode: ollama_generate  # Required: use Ollama native format
+  
+  enable_layout: false  # Recommended for initial testing
+```
+
+### Configuration Options Explained
+
+- **api_path**: `/api/generate` - Ollama's native endpoint (more stable for vision)
+- **model**: `glm-ocr:latest` - Model name (required by Ollama)
+- **api_mode**: `ollama_generate` - Enables Ollama-specific request/response format
+- **enable_layout**: `false` - Disable layout detection if dependencies not installed
+
+## Usage
+
+### Command Line
+
+```bash
+# Parse a single image
+glmocr parse examples/source/code.png --config config.yaml
+
+# Parse with custom output directory
+glmocr parse examples/source/code.png --output ./results/
+
+# Enable debug logging
+glmocr parse examples/source/code.png --log-level DEBUG
+```
+
+### Python API
+
+```python
+from glmocr import GlmOcr
+
+# Initialize with custom config
+with GlmOcr(config_path="config.yaml") as parser:
+    result = parser.parse("image.png")
+    print(result.markdown_result)
+    result.save(output_dir="./results")
+```
+
+## Troubleshooting
+
+### Issue: 502 Bad Gateway Errors
+
+**Symptom:**
+```
+API server returned status code: 502, response: no body
+```
+
+**Solution:**
+Ensure you're using Ollama's native API mode:
+```yaml
+ocr_api:
+  api_path: /api/generate
+  api_mode: ollama_generate
+```
+
+## Verification
+
+### Check Model Status
+
+```bash
+# List installed models
+ollama list
+
+# View model details
+ollama show glm-ocr:latest
+
+# Check running models
+ollama ps
+```
+
+### Test the API
+
+```bash
+# Test with a simple request (Linux/Mac)
+curl http://localhost:11434/api/generate -d '{
+  "model": "glm-ocr:latest",
+  "prompt": "Hello",
+  "stream": false
+}'
+
+# Windows PowerShell
+Invoke-RestMethod -Uri http://localhost:11434/api/generate -Method Post -Body '{"model":"glm-ocr:latest","prompt":"Hello","stream":false}' -ContentType "application/json"
+```
+
+### Recommendations
+
+- **For Testing/Personal Use**: Ollama is perfect
+- **For Production**: Consider vLLM or SGLang for better performance and stability
+- **For CPU-only**: Ollama is a good choice
+
+## Advanced Configuration
+
+### Custom Model Path
+
+If you have a custom GLM-OCR model:
+
+```bash
+# Create a Modelfile
+cat > Modelfile <<EOF
+FROM /path/to/your/model
+TEMPLATE {{ .Prompt }}
+RENDERER glm-ocr
+PARSER glm-ocr
+PARAMETER temperature 0
+EOF
+
+# Create the model
+ollama create my-glm-ocr -f Modelfile
+
+# Use it in config
+model: my-glm-ocr
+```
+## Uninstallation
+
+```bash
+# Remove the model
+ollama rm glm-ocr:latest
+
+# Uninstall Ollama (varies by OS)
+# macOS/Linux: Remove /usr/local/bin/ollama
+# Windows: Use the uninstaller
+```
+
+## Additional Resources
+
+- [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
+- [GLM-OCR GitHub Repository](https://github.com/zai-org/GLM-OCR)
+- [GLM-OCR API Documentation](https://docs.z.ai/guides/vlm/glm-ocr)
diff --git a/glmocr/config.py b/glmocr/config.py
@@ -32,13 +32,18 @@ class OCRApiConfig(_BaseConfig):
     api_scheme: Optional[str] = None
     api_path: str = "/v1/chat/completions"
     api_url: Optional[str] = None
+    model: Optional[str] = None  # Optional model name (required by Ollama/MLX)
     api_key: Optional[str] = None
 
     # Model name included in API requests.
     model: Optional[str] = None
     headers: Dict[str, str] = Field(default_factory=dict)
     verify_ssl: bool = False
 
+    # API mode: "openai" (default) or "ollama_generate"
+    # Use "ollama_generate" for Ollama's native /api/generate endpoint
+    api_mode: str = "openai"
+
     connect_timeout: int = 300
     request_timeout: int = 300
 
diff --git a/glmocr/config.yaml b/glmocr/config.yaml
@@ -57,23 +57,31 @@ pipeline:
   # - You need offline/air-gapped operation
   # - You want to customize the pipeline (layout detection, prompts, etc.)
 
-  # OCR API client configuration (for self-hosted vLLM/SGLang)
+  # OCR API client configuration (for self-hosted vLLM/SGLang/Ollama)
   ocr_api:
     # Basic connection
     api_host: 127.0.0.1
     api_port: 8080
 
-    # Model name included in API requests.
-    # Required for mlx_vlm.server (e.g. "mlx-community/GLM-OCR-bf16").
-    # Set to null when using vLLM/SGLang (model is selected at server startup).
-    model: null
-
     # URL construction: {api_scheme}://{api_host}:{api_port}{api_path}
     # Or set api_url directly to override
     api_scheme: null # null = auto (https if port 443, else http)
     api_path: /v1/chat/completions
+    # Note: If using Ollama and encountering 502 errors with vision requests,
+    # try switching to "ollama_generate" mode with api_path: /api/generate
     api_url: null # full URL override (optional)
 
+    # Model name (optional)
+    # Required by some runtimes (e.g., Ollama/MLX). Not required for vLLM/SGLang.
+    # For mlx_vlm.server, use e.g. "mlx-community/GLM-OCR-bf16".
+    # If not set, the "model" field will not be included in the request payload.
+    model: null
+
+    # API mode: "openai" (default) or "ollama_generate"
+    # - "openai": Use OpenAI-compatible /v1/chat/completions endpoint (vLLM/SGLang/Ollama)
+    # - "ollama_generate": Use Ollama's native /api/generate endpoint
+    api_mode: openai
+
     # Authentication (for MaaS providers like Zhipu, OpenAI, etc.)
     api_key: null # or set GLMOCR_API_KEY env var
     headers: {} # additional HTTP headers
diff --git a/glmocr/ocr_client.py b/glmocr/ocr_client.py