Skip to content

Commit b3fd96e

Browse files
authored
Merge pull request #68 from donvito/feature/zai-provider-ocr
Feature/zai provider ocr
2 parents dd324bd + b97ddcd commit b3fd96e

27 files changed

+2190
-8
lines changed

.env.example

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ OPENAI_API_KEY=
1212
ANTHROPIC_API_KEY=
1313
OPENROUTER_API_KEY=
1414
AI_GATEWAY_API_KEY=
15+
LLM_GATEWAY_API_KEY=
16+
ZAI_API_KEY=
1517

1618
# Base URL for Ollama (local LLM server)
1719
OLLAMA_BASE_URL=http://localhost:11434

README.md

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
AIBackends is an API server that you can use to integrate AI into your applications. You can run it locally or self-host it.
44

5-
The project supports running open models locally with Ollama, LM Studio or LlamaCpp. It also supports LLM Gateway, OpenRouter, OpenAI, Anthropic and Google AI Studio, Baseten providers.
5+
The project supports running open models locally with Ollama, LM Studio or LlamaCpp. It also supports LLM Gateway, OpenRouter, OpenAI, Anthropic, Google AI Studio, Baseten and ZAI providers.
66

77
## Why AI Backends?
88

@@ -31,16 +31,51 @@ Since APIs are ready to use, you don't need to understand prompt engineering. Ju
3131
| **/api/pdf-summarizer** | Extract and summarize content from PDF documents with AI |
3232
| **/api/web-search** | Perform web searches and get AI-powered summaries of results |
3333

34+
#### Text Processing Examples
35+
36+
**Web Search** - Search the web and get AI-powered summaries:
37+
38+
![Web Search Example](images/websearch-example.png)
39+
40+
**Keywords Extraction** - Extract important keywords from text:
41+
42+
![Keywords Example](images/keywords-example.png)
43+
44+
**Sentiment Analysis** - Analyze emotional tone of text:
45+
46+
![Sentiment Example](images/sentiment-example.png)
47+
48+
**Translation** - Translate text between languages:
49+
50+
![Translate Example](images/translate-example.png)
51+
3452
### Data Generation
3553
| Endpoint | Description |
3654
|----------|-------------|
3755
| **/api/synthetic-data** | Generate realistic synthetic data based on prompts with optional JSON schema validation |
3856

57+
**Synthetic Data Generation** - Generate realistic test data with custom schemas:
58+
59+
![Synthetic Data Example](images/synthetic-data-example.png)
60+
3961
### Image Processing
4062

4163
| Endpoint | Description |
4264
|----------|-------------|
43-
| **/api/describe-image** | Describe an image (work in progress) |
65+
| **/api/vision** | Analyze images with vision AI - ask questions, detect objects, get coordinates (ZAI GLM-4.6v) |
66+
| **/api/ocr** | Extract structured data from images using OCR with optional JSON schema output (ZAI GLM-4.6V) |
67+
68+
#### Vision AI Examples
69+
70+
AIBackends includes powerful vision capabilities powered by ZAI GLM-4.6v models. You can ask questions about images, detect objects, and extract structured data.
71+
72+
**Vision Q&A** - Ask questions about any image:
73+
74+
![Vision Example](images/vision-example.png)
75+
76+
**OCR Extraction** - Extract structured data from documents, receipts, and invoices:
77+
78+
![OCR Example](images/ocr-example.png)
4479

4580
More to come...check swagger docs for updated endpoints.
4681

@@ -63,6 +98,7 @@ More to come...check swagger docs for updated endpoints.
6398
| [Vercel AI Gateway](https://vercel.com/ai-gateway) | Open source and private models | Available |
6499
| [Google AI Studio](https://ai.google.dev/) | Gemini models via OpenAI-compatible interface | Available |
65100
| [Baseten](https://baseten.co/) | Cloud-hosted ML models with OpenAI-compatible API | Available |
101+
| [ZAI](https://z.ai/) | GLM models with vision/OCR capabilities | Available |
66102

67103

68104
## Run the project
@@ -189,6 +225,9 @@ BASETEN_BASE_URL=https://inference.baseten.co/v1
189225
190226
# LLM Gateway Configuration (Recommended)
191227
LLM_GATEWAY_API_KEY=your-llm-gateway-api-key
228+
229+
# ZAI Configuration (for Vision/OCR endpoints)
230+
ZAI_API_KEY=your-zai-api-key
192231
```
193232

194233
### LLM Gateway Setup (Recommended for Cloud Providers)
@@ -273,6 +312,9 @@ curl --location 'http://localhost:3000/api/v1/summarize' \
273312
- Home Page: `http://localhost:3000/`
274313
- Swagger Docs: `http://localhost:3000/api/ui`. You can test the API endpoints here.
275314
- JSON Editor: `http://localhost:3000/api/jsoneditor`
315+
- LLM-Friendly API Docs: `http://localhost:3000/api/llms.txt`. Copy the contents of this file and paste it into AI builder tools like Bolt.new, v0, Lovable, or AI coding assistants to help them understand and use the AIBackends API endpoints.
316+
317+
![LLMs.txt Example](images/llms-txt-example.png)
276318

277319
## Testing Examples
278320

images/keywords-example.png

580 KB
Loading

images/llms-txt-example.png

480 KB
Loading

images/ocr-example.png

2.78 MB
Loading

images/sentiment-example.png

609 KB
Loading

images/synthetic-data-example.png

791 KB
Loading

images/translate-example.png

895 KB
Loading

images/vision-ask-example.png

1.5 MB
Loading

images/vision-example.png

1.53 MB
Loading

0 commit comments

Comments
 (0)