Skip to content

Commit 7ac7751

Browse files
author
Admin
committed
feat: add Ollama native API support for vision requests
- Add api_mode config option to support both OpenAI and Ollama formats - Implement _convert_to_ollama_generate() to transform requests - Update connect() method to test appropriate endpoint based on mode - Add model field injection for Ollama/MLX compatibility - Update README.md and README_zh.md with Ollama configuration guide - Include troubleshooting tips for 502 errors and layout dependencies This change enables GLM-OCR to work with Ollama's /api/generate endpoint, which provides more stable vision support than the OpenAI-compatible API in some Ollama versions.
1 parent 8e6fd70 commit 7ac7751

File tree

6 files changed

+388
-20
lines changed

6 files changed

+388
-20
lines changed

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -130,9 +130,12 @@ pipeline:
130130
api_port: 8080
131131
```
132132

133-
#### Option 3: Apple Silicon with mlx-vlm
133+
#### Option 3: Ollama/MLX
134134

135-
See the **[MLX Detailed Deployment Guide](examples/mlx-deploy/README.md)** for full setup instructions, including environment isolation and troubleshooting.
135+
For specialized deployment scenarios, see the detailed guides:
136+
137+
- **[Apple Silicon with mlx-vlm](examples/mlx-deploy/README.md)** - Optimized for Apple Silicon Macs
138+
- **[Ollama Deployment](examples/ollama-deploy/README.md)** - Simple local deployment with Ollama
136139

137140
### SDK Usage Guide
138141

README_zh.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,13 @@ pipeline:
133133
api_port: 8080
134134
```
135135

136+
#### 方式 3: 其他部署选项
137+
138+
针对特定部署场景,请查看详细指南:
139+
140+
- **[Apple Silicon 使用 mlx-vlm](examples/mlx-deploy/README.md)** - 针对 Apple Silicon Mac 优化
141+
- **[Ollama 部署](examples/ollama-deploy/README.md)** - 使用 Ollama 进行简单的本地部署
142+
136143
### SDK 使用指南
137144

138145
#### CLI

examples/ollama-deploy/README.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Ollama Deployment Guide for GLM-OCR
2+
3+
This guide provides detailed instructions for deploying GLM-OCR using Ollama.
4+
5+
## Overview
6+
7+
Ollama provides a simple local deployment option for running GLM-OCR. However, due to limitations in Ollama's OpenAI-compatible API for vision requests, we recommend using Ollama's native `/api/generate` endpoint.
8+
9+
## Installation
10+
11+
### 1. Install Ollama
12+
13+
Download and install Ollama from the official website:
14+
15+
**macOS / Linux:**
16+
```bash
17+
curl -fsSL https://ollama.ai/install.sh | sh
18+
```
19+
20+
**Windows:**
21+
Download the installer from https://ollama.ai/download
22+
23+
### 2. Verify Installation
24+
25+
```bash
26+
ollama --version
27+
```
28+
29+
### 3. Pull the GLM-OCR Model
30+
31+
```bash
32+
ollama pull glm-ocr:latest
33+
```
34+
35+
This will download the GLM-OCR model.
36+
37+
### 4. Start Ollama Service
38+
39+
The Ollama service should start automatically after installation. If not:
40+
41+
```bash
42+
ollama serve
43+
```
44+
45+
The service will run on `http://localhost:11434` by default.
46+
47+
## Configuration
48+
49+
### SDK Configuration
50+
51+
Create or update your `config.yaml`:
52+
53+
```yaml
54+
pipeline:
55+
maas:
56+
enabled: false
57+
58+
ocr_api:
59+
api_host: localhost
60+
api_port: 11434
61+
api_path: /api/generate # Use Ollama native endpoint
62+
model: glm-ocr:latest # Required: specify model name
63+
api_mode: ollama_generate # Required: use Ollama native format
64+
65+
enable_layout: false # Recommended for initial testing
66+
```
67+
68+
### Configuration Options Explained
69+
70+
- **api_path**: `/api/generate` - Ollama's native endpoint (more stable for vision)
71+
- **model**: `glm-ocr:latest` - Model name (required by Ollama)
72+
- **api_mode**: `ollama_generate` - Enables Ollama-specific request/response format
73+
- **enable_layout**: `false` - Disable layout detection if dependencies not installed
74+
75+
## Usage
76+
77+
### Command Line
78+
79+
```bash
80+
# Parse a single image
81+
glmocr parse examples/source/code.png --config config.yaml
82+
83+
# Parse with custom output directory
84+
glmocr parse examples/source/code.png --output ./results/
85+
86+
# Enable debug logging
87+
glmocr parse examples/source/code.png --log-level DEBUG
88+
```
89+
90+
### Python API
91+
92+
```python
93+
from glmocr import GlmOcr
94+
95+
# Initialize with custom config
96+
with GlmOcr(config_path="config.yaml") as parser:
97+
result = parser.parse("image.png")
98+
print(result.markdown_result)
99+
result.save(output_dir="./results")
100+
```
101+
102+
## Troubleshooting
103+
104+
### Issue: 502 Bad Gateway Errors
105+
106+
**Symptom:**
107+
```
108+
API server returned status code: 502, response: no body
109+
```
110+
111+
**Solution:**
112+
Ensure you're using Ollama's native API mode:
113+
```yaml
114+
ocr_api:
115+
api_path: /api/generate
116+
api_mode: ollama_generate
117+
```
118+
119+
## Verification
120+
121+
### Check Model Status
122+
123+
```bash
124+
# List installed models
125+
ollama list
126+
127+
# View model details
128+
ollama show glm-ocr:latest
129+
130+
# Check running models
131+
ollama ps
132+
```
133+
134+
### Test the API
135+
136+
```bash
137+
# Test with a simple request (Linux/Mac)
138+
curl http://localhost:11434/api/generate -d '{
139+
"model": "glm-ocr:latest",
140+
"prompt": "Hello",
141+
"stream": false
142+
}'
143+
144+
# Windows PowerShell
145+
Invoke-RestMethod -Uri http://localhost:11434/api/generate -Method Post -Body '{"model":"glm-ocr:latest","prompt":"Hello","stream":false}' -ContentType "application/json"
146+
```
147+
148+
### Recommendations
149+
150+
- **For Testing/Personal Use**: Ollama is perfect
151+
- **For Production**: Consider vLLM or SGLang for better performance and stability
152+
- **For CPU-only**: Ollama is a good choice
153+
154+
## Advanced Configuration
155+
156+
### Custom Model Path
157+
158+
If you have a custom GLM-OCR model:
159+
160+
```bash
161+
# Create a Modelfile
162+
cat > Modelfile <<EOF
163+
FROM /path/to/your/model
164+
TEMPLATE {{ .Prompt }}
165+
RENDERER glm-ocr
166+
PARSER glm-ocr
167+
PARAMETER temperature 0
168+
EOF
169+
170+
# Create the model
171+
ollama create my-glm-ocr -f Modelfile
172+
173+
# Use it in config
174+
model: my-glm-ocr
175+
```
176+
## Uninstallation
177+
178+
```bash
179+
# Remove the model
180+
ollama rm glm-ocr:latest
181+
182+
# Uninstall Ollama (varies by OS)
183+
# macOS/Linux: Remove /usr/local/bin/ollama
184+
# Windows: Use the uninstaller
185+
```
186+
187+
## Additional Resources
188+
189+
- [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
190+
- [GLM-OCR GitHub Repository](https://github.com/zai-org/GLM-OCR)
191+
- [GLM-OCR API Documentation](https://docs.z.ai/guides/vlm/glm-ocr)

glmocr/config.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,18 @@ class OCRApiConfig(_BaseConfig):
3232
api_scheme: Optional[str] = None
3333
api_path: str = "/v1/chat/completions"
3434
api_url: Optional[str] = None
35+
model: Optional[str] = None # Optional model name (required by Ollama/MLX)
3536
api_key: Optional[str] = None
3637

3738
# Model name included in API requests.
3839
model: Optional[str] = None
3940
headers: Dict[str, str] = Field(default_factory=dict)
4041
verify_ssl: bool = False
4142

43+
# API mode: "openai" (default) or "ollama_generate"
44+
# Use "ollama_generate" for Ollama's native /api/generate endpoint
45+
api_mode: str = "openai"
46+
4247
connect_timeout: int = 300
4348
request_timeout: int = 300
4449

glmocr/config.yaml

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,23 +57,31 @@ pipeline:
5757
# - You need offline/air-gapped operation
5858
# - You want to customize the pipeline (layout detection, prompts, etc.)
5959

60-
# OCR API client configuration (for self-hosted vLLM/SGLang)
60+
# OCR API client configuration (for self-hosted vLLM/SGLang/Ollama)
6161
ocr_api:
6262
# Basic connection
6363
api_host: 127.0.0.1
6464
api_port: 8080
6565

66-
# Model name included in API requests.
67-
# Required for mlx_vlm.server (e.g. "mlx-community/GLM-OCR-bf16").
68-
# Set to null when using vLLM/SGLang (model is selected at server startup).
69-
model: null
70-
7166
# URL construction: {api_scheme}://{api_host}:{api_port}{api_path}
7267
# Or set api_url directly to override
7368
api_scheme: null # null = auto (https if port 443, else http)
7469
api_path: /v1/chat/completions
70+
# Note: If using Ollama and encountering 502 errors with vision requests,
71+
# try switching to "ollama_generate" mode with api_path: /api/generate
7572
api_url: null # full URL override (optional)
7673

74+
# Model name (optional)
75+
# Required by some runtimes (e.g., Ollama/MLX). Not required for vLLM/SGLang.
76+
# For mlx_vlm.server, use e.g. "mlx-community/GLM-OCR-bf16".
77+
# If not set, the "model" field will not be included in the request payload.
78+
model: null
79+
80+
# API mode: "openai" (default) or "ollama_generate"
81+
# - "openai": Use OpenAI-compatible /v1/chat/completions endpoint (vLLM/SGLang/Ollama)
82+
# - "ollama_generate": Use Ollama's native /api/generate endpoint
83+
api_mode: openai
84+
7785
# Authentication (for MaaS providers like Zhipu, OpenAI, etc.)
7886
api_key: null # or set GLMOCR_API_KEY env var
7987
headers: {} # additional HTTP headers

0 commit comments

Comments
 (0)