Skip to content

Commit c956842

Browse files
author
Admin
committed
feat: add Ollama native API support for vision requests
- Add api_mode config option to support both OpenAI and Ollama formats - Implement _convert_to_ollama_generate() to transform requests - Update connect() method to test appropriate endpoint based on mode - Add model field injection for Ollama/MLX compatibility - Update README.md and README_zh.md with Ollama configuration guide - Include troubleshooting tips for 502 errors and layout dependencies This change enables GLM-OCR to work with Ollama's /api/generate endpoint, which provides more stable vision support than the OpenAI-compatible API in some Ollama versions.
1 parent 8e6fd70 commit c956842

File tree

5 files changed

+261
-18
lines changed

5 files changed

+261
-18
lines changed

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,60 @@ pipeline:
134134

135135
See the **[MLX Detailed Deployment Guide](examples/mlx-deploy/README.md)** for full setup instructions, including environment isolation and troubleshooting.
136136

137+
#### Option 4: Self-hosting with Ollama
138+
139+
Ollama provides a simple local deployment option, but requires special configuration for vision support.
140+
141+
##### Install and Setup Ollama
142+
143+
1. Install Ollama from https://ollama.ai
144+
145+
2. Pull the GLM-OCR model:
146+
147+
```bash
148+
ollama pull glm-ocr:latest
149+
```
150+
151+
3. Start the Ollama service:
152+
153+
```bash
154+
ollama serve
155+
```
156+
157+
##### Configure SDK for Ollama
158+
159+
**Important**: Ollama's OpenAI-compatible API (`/v1/chat/completions`) has unstable vision support in some versions and may return 502 errors. We recommend using Ollama's native API.
160+
161+
Configure `config.yaml`:
162+
163+
```yaml
164+
pipeline:
165+
maas:
166+
enabled: false
167+
ocr_api:
168+
api_host: localhost
169+
api_port: 11434 # Ollama default port
170+
api_path: /api/generate # Use Ollama native endpoint
171+
model: glm-ocr:latest # Required: specify model name
172+
api_mode: ollama_generate # Required: use Ollama native format
173+
enable_layout: false # Recommended to disable layout mode for testing
174+
```
175+
176+
**Configuration Notes**:
177+
- `api_mode: ollama_generate` - Use Ollama's native `/api/generate` endpoint
178+
- `model: glm-ocr:latest` - Ollama requires model name to be specified
179+
- `enable_layout: false` - Disable if layout dependencies are not installed
180+
181+
**Troubleshooting**:
182+
- If you encounter 502 errors, ensure `api_mode: ollama_generate` is set
183+
- If you see "Layout detection dependencies" errors, set `enable_layout: false`
184+
- Use `ollama ps` to check if the model is loaded
185+
- Use `ollama show glm-ocr:latest` to view model information
186+
187+
**Performance Recommendations**:
188+
- vLLM/SGLang provide better performance and stability for production use
189+
- Ollama is suitable for quick testing and personal use
190+
137191
### SDK Usage Guide
138192

139193
#### CLI

README_zh.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,61 @@ pipeline:
133133
api_port: 8080
134134
```
135135

136+
#### 方式 3: Apple Silicon 使用 mlx-vlm
137+
138+
查看 **[MLX 详细部署指南](examples/mlx-deploy/README.md)** 获取完整的设置说明,包括环境隔离和故障排除。
139+
140+
#### 方式 4: 使用 Ollama 自部署
141+
142+
Ollama 提供简单的本地部署方式,但需要特别配置以支持 vision 功能。
143+
144+
##### 安装和设置 Ollama
145+
146+
1. 从 https://ollama.ai 安装 Ollama
147+
148+
2. 拉取 GLM-OCR 模型:
149+
150+
```bash
151+
ollama pull glm-ocr:latest
152+
```
153+
154+
3. 启动 Ollama 服务:
155+
156+
```bash
157+
ollama serve
158+
```
159+
160+
##### 配置 SDK 使用 Ollama
161+
162+
**重要**: Ollama 的 OpenAI 兼容 API (`/v1/chat/completions`) 在某些版本中对 vision 请求支持不稳定,可能返回 502 错误。推荐使用 Ollama 原生 API。
163+
164+
配置 `config.yaml`:
165+
166+
```yaml
167+
pipeline:
168+
maas:
169+
enabled: false
170+
ocr_api:
171+
api_host: localhost
172+
api_port: 11434 # Ollama 默认端口
173+
api_path: /api/generate # 使用 Ollama 原生端点
174+
model: glm-ocr:latest # 必需: 指定模型名称
175+
api_mode: ollama_generate # 必需: 使用 Ollama 原生格式
176+
enable_layout: false # 推荐先禁用 layout 模式测试
177+
```
178+
179+
**配置说明**:
180+
- `api_mode: ollama_generate` - 使用 Ollama 原生 `/api/generate` 端点
181+
- `model: glm-ocr:latest` - Ollama 要求必须指定模型名称
182+
- `enable_layout: false` - 如果没有安装 layout 依赖,需要禁用
183+
184+
**故障排除**:
185+
- 如果遇到 502 错误,确认已设置 `api_mode: ollama_generate`
186+
- 如果遇到 "Layout detection dependencies" 错误,设置 `enable_layout: false`
187+
- 使用 `ollama ps` 检查模型是否已加载
188+
- 使用 `ollama show glm-ocr:latest` 查看模型信息
189+
190+
136191
### SDK 使用指南
137192

138193
#### CLI

glmocr/config.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,18 @@ class OCRApiConfig(_BaseConfig):
3232
api_scheme: Optional[str] = None
3333
api_path: str = "/v1/chat/completions"
3434
api_url: Optional[str] = None
35+
model: Optional[str] = None # Optional model name (required by Ollama/MLX)
3536
api_key: Optional[str] = None
3637

3738
# Model name included in API requests.
3839
model: Optional[str] = None
3940
headers: Dict[str, str] = Field(default_factory=dict)
4041
verify_ssl: bool = False
4142

43+
# API mode: "openai" (default) or "ollama_generate"
44+
# Use "ollama_generate" for Ollama's native /api/generate endpoint
45+
api_mode: str = "openai"
46+
4247
connect_timeout: int = 300
4348
request_timeout: int = 300
4449

glmocr/config.yaml

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,23 +57,31 @@ pipeline:
5757
# - You need offline/air-gapped operation
5858
# - You want to customize the pipeline (layout detection, prompts, etc.)
5959

60-
# OCR API client configuration (for self-hosted vLLM/SGLang)
60+
# OCR API client configuration (for self-hosted vLLM/SGLang/Ollama)
6161
ocr_api:
6262
# Basic connection
6363
api_host: 127.0.0.1
6464
api_port: 8080
6565

66-
# Model name included in API requests.
67-
# Required for mlx_vlm.server (e.g. "mlx-community/GLM-OCR-bf16").
68-
# Set to null when using vLLM/SGLang (model is selected at server startup).
69-
model: null
70-
7166
# URL construction: {api_scheme}://{api_host}:{api_port}{api_path}
7267
# Or set api_url directly to override
7368
api_scheme: null # null = auto (https if port 443, else http)
7469
api_path: /v1/chat/completions
70+
# Note: If using Ollama and encountering 502 errors with vision requests,
71+
# try switching to "ollama_generate" mode with api_path: /api/generate
7572
api_url: null # full URL override (optional)
7673

74+
# Model name (optional)
75+
# Required by some runtimes (e.g., Ollama/MLX). Not required for vLLM/SGLang.
76+
# For mlx_vlm.server, use e.g. "mlx-community/GLM-OCR-bf16".
77+
# If not set, the "model" field will not be included in the request payload.
78+
model: null
79+
80+
# API mode: "openai" (default) or "ollama_generate"
81+
# - "openai": Use OpenAI-compatible /v1/chat/completions endpoint (vLLM/SGLang/Ollama)
82+
# - "ollama_generate": Use Ollama's native /api/generate endpoint
83+
api_mode: openai
84+
7785
# Authentication (for MaaS providers like Zhipu, OpenAI, etc.)
7886
api_key: null # or set GLMOCR_API_KEY env var
7987
headers: {} # additional HTTP headers

glmocr/ocr_client.py

Lines changed: 133 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ def __init__(self, config: "OCRApiConfig"):
5454

5555
self.api_key = config.api_key or os.getenv("GLMOCR_API_KEY")
5656
self.extra_headers = config.headers or {}
57+
self.model = config.model # Optional model name
58+
59+
# API mode: "openai" or "ollama_generate"
60+
self.api_mode = getattr(config, "api_mode", "openai")
5761

5862
# SSL verification
5963
self.verify_ssl = config.verify_ssl
@@ -168,18 +172,33 @@ def connect(self):
168172
sock.settimeout(10)
169173
result = sock.connect_ex((self.api_host, self.api_port))
170174
if result == 0:
171-
# Send a test request to the chat/completions endpoint
175+
# Send a test request
172176
try:
173-
test_payload = {
174-
"messages": [
175-
{
176-
"role": "user",
177-
"content": [{"type": "text", "text": "hello"}],
178-
}
179-
],
180-
"max_tokens": 10,
181-
"temperature": 0.1,
182-
}
177+
# Build test payload based on API mode
178+
if self.api_mode == "ollama_generate":
179+
test_payload = {
180+
"model": self.model or "glm-ocr:latest",
181+
"prompt": "hello",
182+
"stream": False,
183+
"options": {"num_predict": 10},
184+
}
185+
else:
186+
test_payload = {
187+
"messages": [
188+
{
189+
"role": "user",
190+
"content": [
191+
{"type": "text", "text": "hello"}
192+
],
193+
}
194+
],
195+
"max_tokens": 10,
196+
"temperature": 0.1,
197+
}
198+
# Inject model field if configured (required by Ollama/MLX)
199+
if self.model:
200+
test_payload["model"] = self.model
201+
183202
headers = {
184203
"Content-Type": "application/json",
185204
**self.extra_headers,
@@ -237,6 +256,14 @@ def process(self, request_data: Dict) -> Tuple[Dict, int]:
237256
if self._session is None:
238257
self._session = self._make_session()
239258

259+
# Convert request format based on API mode
260+
if self.api_mode == "ollama_generate":
261+
request_data = self._convert_to_ollama_generate(request_data)
262+
else:
263+
# Inject model field if configured (required by Ollama/MLX)
264+
if self.model:
265+
request_data["model"] = self.model
266+
240267
headers = {"Content-Type": "application/json", **self.extra_headers}
241268
if self.api_key:
242269
headers["Authorization"] = f"Bearer {self.api_key}"
@@ -261,7 +288,13 @@ def process(self, request_data: Dict) -> Tuple[Dict, int]:
261288

262289
if response.status_code == 200:
263290
result = response.json()
264-
output = result["choices"][0]["message"]["content"]
291+
292+
# Parse response based on API mode
293+
if self.api_mode == "ollama_generate":
294+
output = result.get("response", "")
295+
else:
296+
output = result["choices"][0]["message"]["content"]
297+
265298
return {"choices": [{"message": {"content": output.strip()}}]}, 200
266299

267300
status = int(response.status_code)
@@ -316,3 +349,91 @@ def process(self, request_data: Dict) -> Tuple[Dict, int]:
316349
"error": f"API request failed after {total_attempts} attempts",
317350
"detail": last_error,
318351
}, 500
352+
353+
def _convert_to_ollama_generate(self, request_data: Dict) -> Dict:
354+
"""Convert OpenAI chat format to Ollama generate format.
355+
356+
OpenAI format:
357+
{
358+
"messages": [
359+
{
360+
"role": "user",
361+
"content": [
362+
{"type": "text", "text": "..."},
363+
{"type": "image_url", "image_url": "data:image/...;base64,..."}
364+
]
365+
}
366+
],
367+
"max_tokens": 100,
368+
...
369+
}
370+
371+
Ollama generate format:
372+
{
373+
"model": "glm-ocr:latest",
374+
"prompt": "...",
375+
"images": ["base64_string"],
376+
"stream": false,
377+
"options": {
378+
"num_predict": 100,
379+
...
380+
}
381+
}
382+
"""
383+
messages = request_data.get("messages", [])
384+
385+
# Extract prompt and images from the last user message
386+
prompt = ""
387+
images = []
388+
389+
for msg in messages:
390+
if msg.get("role") == "user":
391+
content = msg.get("content", "")
392+
393+
if isinstance(content, str):
394+
prompt = content
395+
elif isinstance(content, list):
396+
for item in content:
397+
if item.get("type") == "text":
398+
prompt = item.get("text", "")
399+
elif item.get("type") == "image_url":
400+
# Extract base64 from data URI
401+
image_url = item.get("image_url", "")
402+
if isinstance(image_url, dict):
403+
image_url = image_url.get("url", "")
404+
405+
# Parse data:image/...;base64,<data>
406+
if image_url.startswith("data:"):
407+
parts = image_url.split(",", 1)
408+
if len(parts) == 2:
409+
images.append(parts[1])
410+
else:
411+
images.append(image_url)
412+
413+
# Build Ollama generate request
414+
ollama_request = {
415+
"model": self.model or "glm-ocr:latest",
416+
"prompt": prompt,
417+
"stream": False,
418+
}
419+
420+
if images:
421+
ollama_request["images"] = images
422+
423+
# Map parameters to Ollama options
424+
options = {}
425+
if "max_tokens" in request_data:
426+
options["num_predict"] = request_data["max_tokens"]
427+
if "temperature" in request_data:
428+
options["temperature"] = request_data["temperature"]
429+
if "top_p" in request_data:
430+
options["top_p"] = request_data["top_p"]
431+
if "top_k" in request_data:
432+
options["top_k"] = request_data["top_k"]
433+
if "repetition_penalty" in request_data:
434+
options["repeat_penalty"] = request_data["repetition_penalty"]
435+
436+
if options:
437+
ollama_request["options"] = options
438+
439+
return ollama_request

0 commit comments

Comments
 (0)