Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,12 @@ pipeline:
api_port: 8080
```

#### Option 3: Apple Silicon with mlx-vlm
#### Option 3: Ollama/MLX

See the **[MLX Detailed Deployment Guide](examples/mlx-deploy/README.md)** for full setup instructions, including environment isolation and troubleshooting.
For specialized deployment scenarios, see the detailed guides:

- **[Apple Silicon with mlx-vlm](examples/mlx-deploy/README.md)** - Optimized for Apple Silicon Macs
- **[Ollama Deployment](examples/ollama-deploy/README.md)** - Simple local deployment with Ollama

### SDK Usage Guide

Expand Down
7 changes: 7 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,13 @@ pipeline:
api_port: 8080
```

#### 方式 3: 其他部署选项

针对特定部署场景,请查看详细指南:

- **[Apple Silicon 使用 mlx-vlm](examples/mlx-deploy/README.md)** - 针对 Apple Silicon Mac 优化
- **[Ollama 部署](examples/ollama-deploy/README.md)** - 使用 Ollama 进行简单的本地部署

### SDK 使用指南

#### CLI
Expand Down
191 changes: 191 additions & 0 deletions examples/ollama-deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
# Ollama Deployment Guide for GLM-OCR

This guide provides detailed instructions for deploying GLM-OCR using Ollama.

## Overview

Ollama provides a simple local deployment option for running GLM-OCR. However, due to limitations in Ollama's OpenAI-compatible API for vision requests, we recommend using Ollama's native `/api/generate` endpoint.

## Installation

### 1. Install Ollama

Download and install Ollama from the official website:

**macOS / Linux:**
```bash
curl -fsSL https://ollama.ai/install.sh | sh
```

**Windows:**
Download the installer from https://ollama.ai/download

### 2. Verify Installation

```bash
ollama --version
```

### 3. Pull the GLM-OCR Model

```bash
ollama pull glm-ocr:latest
```

This will download the GLM-OCR model.

### 4. Start Ollama Service

The Ollama service should start automatically after installation. If not:

```bash
ollama serve
```

The service will run on `http://localhost:11434` by default.

## Configuration

### SDK Configuration

Create or update your `config.yaml`:

```yaml
pipeline:
maas:
enabled: false

ocr_api:
api_host: localhost
api_port: 11434
api_path: /api/generate # Use Ollama native endpoint
model: glm-ocr:latest # Required: specify model name
api_mode: ollama_generate # Required: use Ollama native format

enable_layout: false # Recommended for initial testing
```

### Configuration Options Explained

- **api_path**: `/api/generate` - Ollama's native endpoint (more stable for vision)
- **model**: `glm-ocr:latest` - Model name (required by Ollama)
- **api_mode**: `ollama_generate` - Enables Ollama-specific request/response format
- **enable_layout**: `false` - Disable layout detection if dependencies not installed

## Usage

### Command Line

```bash
# Parse a single image
glmocr parse examples/source/code.png --config config.yaml

# Parse with custom output directory
glmocr parse examples/source/code.png --output ./results/

# Enable debug logging
glmocr parse examples/source/code.png --log-level DEBUG
```

### Python API

```python
from glmocr import GlmOcr

# Initialize with custom config
with GlmOcr(config_path="config.yaml") as parser:
result = parser.parse("image.png")
print(result.markdown_result)
result.save(output_dir="./results")
```

## Troubleshooting

### Issue: 502 Bad Gateway Errors

**Symptom:**
```
API server returned status code: 502, response: no body
```

**Solution:**
Ensure you're using Ollama's native API mode:
```yaml
ocr_api:
api_path: /api/generate
api_mode: ollama_generate
```

## Verification

### Check Model Status

```bash
# List installed models
ollama list

# View model details
ollama show glm-ocr:latest

# Check running models
ollama ps
```

### Test the API

```bash
# Test with a simple request (Linux/Mac)
curl http://localhost:11434/api/generate -d '{
"model": "glm-ocr:latest",
"prompt": "Hello",
"stream": false
}'

# Windows PowerShell
Invoke-RestMethod -Uri http://localhost:11434/api/generate -Method Post -Body '{"model":"glm-ocr:latest","prompt":"Hello","stream":false}' -ContentType "application/json"
```

### Recommendations

- **For Testing/Personal Use**: Ollama is perfect
- **For Production**: Consider vLLM or SGLang for better performance and stability
- **For CPU-only**: Ollama is a good choice

## Advanced Configuration

### Custom Model Path

If you have a custom GLM-OCR model:

```bash
# Create a Modelfile
cat > Modelfile <<EOF
FROM /path/to/your/model
TEMPLATE {{ .Prompt }}
RENDERER glm-ocr
PARSER glm-ocr
PARAMETER temperature 0
EOF

# Create the model
ollama create my-glm-ocr -f Modelfile

# Use it in config
model: my-glm-ocr
```
## Uninstallation

```bash
# Remove the model
ollama rm glm-ocr:latest

# Uninstall Ollama (varies by OS)
# macOS/Linux: Remove /usr/local/bin/ollama
# Windows: Use the uninstaller
```

## Additional Resources

- [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/api.md)
- [GLM-OCR GitHub Repository](https://github.com/zai-org/GLM-OCR)
- [GLM-OCR API Documentation](https://docs.z.ai/guides/vlm/glm-ocr)
5 changes: 5 additions & 0 deletions glmocr/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,18 @@ class OCRApiConfig(_BaseConfig):
api_scheme: Optional[str] = None
api_path: str = "/v1/chat/completions"
api_url: Optional[str] = None
model: Optional[str] = None # Optional model name (required by Ollama/MLX)
api_key: Optional[str] = None

# Model name included in API requests.
model: Optional[str] = None
headers: Dict[str, str] = Field(default_factory=dict)
verify_ssl: bool = False

# API mode: "openai" (default) or "ollama_generate"
# Use "ollama_generate" for Ollama's native /api/generate endpoint
api_mode: str = "openai"

connect_timeout: int = 300
request_timeout: int = 300

Expand Down
12 changes: 10 additions & 2 deletions glmocr/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,23 +57,31 @@ pipeline:
# - You need offline/air-gapped operation
# - You want to customize the pipeline (layout detection, prompts, etc.)

# OCR API client configuration (for self-hosted vLLM/SGLang)
# OCR API client configuration (for self-hosted vLLM/SGLang/Ollama)
ocr_api:
# Basic connection
api_host: 127.0.0.1
api_port: 8080

# Model name included in API requests.
# Required for mlx_vlm.server (e.g. "mlx-community/GLM-OCR-bf16").
# Set to `glm-ocr` to match `--served-model-name` when using vLLM/SGLang
# Set to `glm-ocr` to match `--served-model-name` when using vLLM/SGLang.
# For Ollama, set to your model name (e.g. "glm-ocr:latest").
model: glm-ocr

# URL construction: {api_scheme}://{api_host}:{api_port}{api_path}
# Or set api_url directly to override
api_scheme: null # null = auto (https if port 443, else http)
api_path: /v1/chat/completions
# Note: If using Ollama and encountering 502 errors with vision requests,
# try switching to "ollama_generate" mode with api_path: /api/generate
api_url: null # full URL override (optional)

# API mode: "openai" (default) or "ollama_generate"
# - "openai": Use OpenAI-compatible /v1/chat/completions endpoint (vLLM/SGLang/Ollama)
# - "ollama_generate": Use Ollama's native /api/generate endpoint
api_mode: openai

# Authentication (for MaaS providers like Zhipu, OpenAI, etc.)
api_key: null # or set GLMOCR_API_KEY env var
headers: {} # additional HTTP headers
Expand Down
Loading