|
| 1 | +# MinerU API Reference |
| 2 | + |
| 3 | +Official docs: https://mineru.net/apiManage/docs · Token: https://mineru.net/apiManage/token |
| 4 | + |
| 5 | +MinerU exposes **two** document-parsing APIs. This skill auto-routes between them. |
| 6 | + |
| 7 | +| | 🎯 Standard API | ⚡ Agent API (lightweight) | |
| 8 | +|---|---|---| |
| 9 | +| Base URL | `https://mineru.net/api/v4` | `https://mineru.net/api/v1/agent` | |
| 10 | +| Token | **required** (`Bearer`) | **none** (IP rate-limited) | |
| 11 | +| Models | `pipeline` / `vlm` / `MinerU-HTML` | fixed lightweight `pipeline` | |
| 12 | +| File size | ≤ 200 MB | ≤ 10 MB | |
| 13 | +| Pages | ≤ 200 | ≤ 20 | |
| 14 | +| Batch | ≤ 50 per request | single file only | |
| 15 | +| Output | zip (Markdown + JSON, optional DOCX/HTML/LaTeX) | Markdown only (CDN link) | |
| 16 | +| Designed for | high-accuracy / complex / batch | AI-agent / quick / no-login | |
| 17 | + |
| 18 | +Free Standard-API quota: **1000 pages/day at highest priority** (overflow is lower priority). |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Authentication (Standard API) |
| 23 | + |
| 24 | +``` |
| 25 | +Authorization: Bearer YOUR_API_TOKEN |
| 26 | +``` |
| 27 | + |
| 28 | +Get a token at https://mineru.net/apiManage/token. |
| 29 | + |
| 30 | +> **Response envelopes.** Business endpoints return `{"code":0,"data":{…},"msg":"ok"}`. |
| 31 | +> The auth/gateway layer returns a *different* shape on failure: |
| 32 | +> `{"success":false,"msgCode":"A0202","msg":"user authenticate failed"}`. |
| 33 | +> Clients must handle both — this skill maps `msgCode` to the same error hints. |
| 34 | +
|
| 35 | +--- |
| 36 | + |
| 37 | +## Standard API endpoints (`/api/v4`) |
| 38 | + |
| 39 | +### Single URL — `POST /extract/task` |
| 40 | + |
| 41 | +```json |
| 42 | +{ |
| 43 | + "url": "https://example.com/doc.pdf", |
| 44 | + "model_version": "vlm", |
| 45 | + "is_ocr": false, |
| 46 | + "enable_formula": true, |
| 47 | + "enable_table": true, |
| 48 | + "language": "ch", |
| 49 | + "page_ranges": "1-10", |
| 50 | + "extra_formats": ["docx", "html"], |
| 51 | + "data_id": "my-document" |
| 52 | +} |
| 53 | +``` |
| 54 | +Response → `{ "code": 0, "data": { "task_id": "…" } }`. HTML inputs require `model_version: "MinerU-HTML"`. |
| 55 | + |
| 56 | +### Get task result — `GET /extract/task/{task_id}` |
| 57 | + |
| 58 | +```json |
| 59 | +{ "code": 0, "data": { "task_id": "…", "state": "done", "full_zip_url": "https://…", "err_msg": "" } } |
| 60 | +``` |
| 61 | + |
| 62 | +### Batch local upload — `POST /file-urls/batch` |
| 63 | + |
| 64 | +Returns signed upload URLs; PUT each file (no `Content-Type`). Up to **50** files / request. |
| 65 | + |
| 66 | +```json |
| 67 | +{ "files": [ { "name": "doc.pdf", "data_id": "doc" } ], "model_version": "vlm" } |
| 68 | +``` |
| 69 | +Response → `{ "code": 0, "data": { "batch_id": "…", "file_urls": ["https://…"] } }`. |
| 70 | + |
| 71 | +### Batch URL — `POST /extract/task/batch` |
| 72 | + |
| 73 | +```json |
| 74 | +{ "files": [ { "url": "https://…/doc.pdf", "data_id": "doc" } ], "model_version": "vlm" } |
| 75 | +``` |
| 76 | + |
| 77 | +### Batch results — `GET /extract-results/batch/{batch_id}` |
| 78 | + |
| 79 | +```json |
| 80 | +{ "code": 0, "data": { "batch_id": "…", "extract_result": [ |
| 81 | + { "file_name": "doc.pdf", "state": "done", "full_zip_url": "https://…" } |
| 82 | +] } } |
| 83 | +``` |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +## Agent API endpoints (`/api/v1/agent`) — no token |
| 88 | + |
| 89 | +### URL — `POST /parse/url` |
| 90 | + |
| 91 | +```json |
| 92 | +{ "url": "https://…/doc.pdf", "language": "ch", "enable_table": true, "is_ocr": false, "enable_formula": true, "page_range": "1-10" } |
| 93 | +``` |
| 94 | +`page_range` accepts `from-to` or a single page only (no commas). Returns `{ "code": 0, "data": { "task_id": "…" } }`. |
| 95 | + |
| 96 | +### File — `POST /parse/file` |
| 97 | + |
| 98 | +```json |
| 99 | +{ "file_name": "doc.pdf", "language": "ch" } |
| 100 | +``` |
| 101 | +Response → `{ "data": { "task_id": "…", "file_url": "https://oss…" } }`; PUT the file to `file_url`. |
| 102 | + |
| 103 | +### Result — `GET /parse/{task_id}` |
| 104 | + |
| 105 | +```json |
| 106 | +{ "code": 0, "data": { "task_id": "…", "state": "done", "markdown_url": "https://cdn…/full.md" } } |
| 107 | +``` |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +## Task states |
| 112 | + |
| 113 | +`pending` (queued) · `running` (parsing) · `converting` (format conversion) · |
| 114 | +`uploading` (downloading source, Agent) · `waiting-file` (awaiting upload) · |
| 115 | +`done` (complete) · `failed` (error). |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## Parameters |
| 120 | + |
| 121 | +| Parameter | Type | Default | Notes | |
| 122 | +|-----------|------|---------|-------| |
| 123 | +| `model_version` | string | `pipeline` | `pipeline`, `vlm` (recommended), `MinerU-HTML` (HTML only) | |
| 124 | +| `is_ocr` | bool | `false` | OCR for scanned docs (pipeline/vlm) | |
| 125 | +| `enable_formula` | bool | `true` | Formula recognition | |
| 126 | +| `enable_table` | bool | `true` | Table recognition | |
| 127 | +| `language` | string | `ch` | OCR language (see official `language` table) | |
| 128 | +| `page_ranges` | string | all | Standard: `"2,4-6"`; Agent `page_range`: `"1-10"` only | |
| 129 | +| `extra_formats` | array | `[]` | `docx` / `html` / `latex` (Standard only) | |
| 130 | +| `data_id` | string | – | `[A-Za-z0-9_.-]`, ≤ 128 chars | |
| 131 | +| `no_cache` | bool | `false` | Bypass URL cache (Standard) | |
| 132 | +| `cache_tolerance` | int | `900` | Cache TTL seconds (Standard) | |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## Limits |
| 137 | + |
| 138 | +| | Standard | Agent | |
| 139 | +|---|---|---| |
| 140 | +| File size | 200 MB | 10 MB | |
| 141 | +| Pages | 200 | 20 | |
| 142 | +| Batch | 50 / request | 1 | |
| 143 | +| Quota | 1000 pages/day priority | IP rate-limited (HTTP 429) | |
| 144 | + |
| 145 | +Supported types: PDF, images (png/jpg/jpeg/jp2/webp/gif/bmp), Doc(x), Ppt(x), Xls(x); HTML is Standard-only. |
| 146 | + |
| 147 | +--- |
| 148 | + |
| 149 | +## Error codes |
| 150 | + |
| 151 | +| Code | Meaning | |
| 152 | +|------|---------| |
| 153 | +| `A0202` | Invalid token | |
| 154 | +| `A0211` | Token expired | |
| 155 | +| `-500` | Parameter error | |
| 156 | +| `-10001` / `-10002` | Service error / invalid params | |
| 157 | +| `-60002` | Unsupported file format | |
| 158 | +| `-60003` / `-60004` | File read failed / empty file | |
| 159 | +| `-60005` | File too large (> 200 MB) | |
| 160 | +| `-60006` | Too many pages (> 200) | |
| 161 | +| `-60008` | File read timeout (URL unreachable) | |
| 162 | +| `-60010` | Parse failed | |
| 163 | +| `-60015` / `-60016` | File / format conversion failed | |
| 164 | +| `-60018` | Daily quota reached | |
| 165 | +| `-60022` | Web page read failed (rate-limited) | |
| 166 | +| **Agent API** | | |
| 167 | +| `-30001` | Exceeds Agent 10 MB limit → use Standard API | |
| 168 | +| `-30002` | Unsupported file type for Agent | |
| 169 | +| `-30003` | Exceeds Agent 20-page limit → use Standard API or `--pages` | |
| 170 | +| `-30004` | Invalid request parameters | |
0 commit comments