Skip to content

Commit 9473ae2

Browse files
authored
refactor(adk): replace go-fitz with go-pdfium (WASM) for MultiModalRead (#869)
Switch PDF processing in agentkit and local backends from github.com/gen2brain/go-fitz to github.com/klippa-app/go-pdfium with the WebAssembly backend, aligning with aisandbox_v2.go. Removes the CGO/MuPDF dependency entirely. The change introduces a process-global pdfium worker pool (lazy-init) and a new PDFiumPoolConfig with MinIdle/MaxIdle/MaxTotal/AcquireTimeout knobs exposed via MultiModalReadConfig. Each go module maintains its own pool since they cannot share an internal package across module boundaries.
1 parent bde4f89 commit 9473ae2

14 files changed

Lines changed: 1029 additions & 264 deletions

File tree

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121
- uses: actions/checkout@v4
2222
- uses: actions/setup-go@v5
2323
with:
24-
go-version: "1.22"
24+
go-version: "1.25.6"
2525
# - name: Go test for every module
2626
# run: |
2727
# modules=`find . -name "go.mod" -exec dirname {} \;`

adk/backend/agentkit/README.md

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,24 @@ A secure filesystem backend for EINO ADK that executes operations in Volcengine'
1010
go get github.com/cloudwego/eino-ext/adk/backend/agentkit
1111
```
1212

13-
#### Native dependency for `MultiModalRead` (PDF rendering)
13+
#### PDF rendering for `MultiModalRead`
1414

15-
`MultiModalRead` rasterises PDF pages via [`go-fitz`](https://github.com/gen2brain/go-fitz),
16-
which loads MuPDF at runtime through `purego`. Install MuPDF before running:
15+
`MultiModalRead` rasterises PDF pages via [`klippa-app/go-pdfium`](https://github.com/klippa-app/go-pdfium)
16+
on its **WebAssembly** backend (executed in-process by [`wazero`](https://github.com/tetratelabs/wazero)).
17+
No CGO toolchain or system-level MuPDF/PDFium libraries are required — it works out of the box
18+
across Linux, macOS and Windows.
1719

18-
- macOS: `brew install mupdf`
19-
- Ubuntu/Debian: `sudo apt-get install -y libmupdf-dev`
20-
- CentOS/RHEL: `sudo yum install -y mupdf-devel`
20+
Behaviour notes:
2121

22-
If you don't use `MultiModalRead`, MuPDF is not required at runtime.
22+
- A process-global PDFium worker pool is initialised lazily on the first paged-PDF request
23+
(a few hundred ms one-time cost) and reused thereafter. Each WASM worker uses tens of MB
24+
of memory; default `MaxTotal` is `max(NumCPU, 2)`.
25+
- The pool is a single shared instance per process; if a second backend passes a different
26+
`PDFiumPool` sizing, the second config is ignored and a `WARN` log is emitted.
27+
- The `agentkit` and `local` backends live in independent Go modules and therefore each
28+
maintain their **own** process-global pool. Apps importing both will run two pdfium WASM
29+
runtimes concurrently.
30+
- Sizing can be tuned via `MultiModalReadConfig.PDFiumPool` (see below).
2331

2432
### Basic Usage
2533

@@ -85,11 +93,27 @@ type Config struct {
8593
}
8694

8795
type MultiModalReadConfig struct {
88-
MaxImageSizeMB int // image read size limit (MB). Default 10, hard-cap 2048
89-
MaxPDFSizeMB int // full PDF read size limit (MB). Default 20, hard-cap 2048
90-
MaxPagedPDFSizeMB int // paged PDF read size limit (MB). Default 100, hard-cap 2048
91-
MaxPDFPagesPerRequest int // max pages per paged read. Default 20, hard-cap 1000
92-
PDFRenderDPI float64 // DPI when rasterising PDF pages. Default 150, hard-cap 600
96+
MaxImageSizeMB int // image read size limit (MB). Default 10, hard-cap 2048
97+
MaxPDFSizeMB int // full PDF read size limit (MB). Default 20, hard-cap 2048
98+
MaxPagedPDFSizeMB int // paged PDF read size limit (MB). Default 100, hard-cap 2048
99+
MaxPDFPagesPerRequest int // max pages per paged read. Default 20, hard-cap 1000
100+
PDFRenderDPI int // DPI when rasterising PDF pages. Default 150, hard-cap 600
101+
102+
// PDFiumPool tunes the process-global PDFium worker pool used for paged PDF rendering.
103+
// Only honoured on the first lazy initialisation; subsequent callers passing a different
104+
// sizing trigger a WARN log and continue with the existing pool.
105+
PDFiumPool PDFiumPoolConfig
106+
107+
// PDFiumAcquireTimeout caps how long MultiModalRead waits for a pdfium worker
108+
// when the caller's ctx has no deadline. Per-read setting (different callers
109+
// may use different values). Default 30s.
110+
PDFiumAcquireTimeout time.Duration
111+
}
112+
113+
type PDFiumPoolConfig struct {
114+
MinIdle int // minimum idle workers kept alive. Default 1
115+
MaxIdle int // maximum idle workers kept alive. Default 2
116+
MaxTotal int // maximum total workers (>= 2). Default max(2, NumCPU)
93117
}
94118
```
95119

adk/backend/agentkit/README_zh.md

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,22 @@
1010
go get github.com/cloudwego/eino-ext/adk/backend/agentkit
1111
```
1212

13-
#### `MultiModalRead` 的本机依赖(PDF 渲染
13+
#### `MultiModalRead` PDF 渲染
1414

15-
`MultiModalRead` 通过 [`go-fitz`](https://github.com/gen2brain/go-fitz) 渲染 PDF
16-
页面,运行时通过 `purego` 加载 MuPDF。运行前请安装 MuPDF:
15+
`MultiModalRead` 通过 [`klippa-app/go-pdfium`](https://github.com/klippa-app/go-pdfium)
16+
**WebAssembly** 后端(由 [`wazero`](https://github.com/tetratelabs/wazero) 在进程内执行)
17+
光栅化 PDF 页面。**无需 CGO 工具链,也无需 MuPDF/PDFium 等系统级原生库**,在 Linux、
18+
macOS 和 Windows 上开箱即用。
1719

18-
- macOS:`brew install mupdf`
19-
- Ubuntu/Debian:`sudo apt-get install -y libmupdf-dev`
20-
- CentOS/RHEL:`sudo yum install -y mupdf-devel`
20+
行为说明:
2121

22-
如果不使用 `MultiModalRead`,则运行时无需 MuPDF。
22+
- 进程内会在第一次分页 PDF 请求时延迟初始化一个全局 PDFium worker pool(首次约几百
23+
毫秒),后续调用复用。每个 WASM worker 占用约数十 MB 内存,默认 `MaxTotal=max(NumCPU, 2)`
24+
- pool 在整个进程内是单例;若第二个 backend 传入不同的 `PDFiumPool` sizing,第二份配置
25+
会被忽略并打印 `WARN` 日志。
26+
- `agentkit``local` backend 分别属于独立 Go module,因此 **各自维护一份** 进程级 pool。
27+
同时引入两个 backend 的应用会运行两套 pdfium WASM runtime。
28+
- 可通过 `MultiModalReadConfig.PDFiumPool` 调整 pool 大小(见下文)。
2329

2430
### 基本用法
2531

@@ -85,11 +91,25 @@ type Config struct {
8591
}
8692

8793
type MultiModalReadConfig struct {
88-
MaxImageSizeMB int // 图片读取大小上限(MB)。 默认 10,硬上限 2048
89-
MaxPDFSizeMB int // PDF 全量读取大小上限(MB)。 默认 20,硬上限 2048
90-
MaxPagedPDFSizeMB int // PDF 分页读取大小上限(MB)。 默认 100,硬上限 2048
91-
MaxPDFPagesPerRequest int // 单次分页读取最多页数。 默认 20,硬上限 1000
92-
PDFRenderDPI float64 // PDF 页面渲染 DPI。 默认 150,硬上限 600
94+
MaxImageSizeMB int // 图片读取大小上限(MB)。 默认 10,硬上限 2048
95+
MaxPDFSizeMB int // PDF 全量读取大小上限(MB)。 默认 20,硬上限 2048
96+
MaxPagedPDFSizeMB int // PDF 分页读取大小上限(MB)。 默认 100,硬上限 2048
97+
MaxPDFPagesPerRequest int // 单次分页读取最多页数。 默认 20,硬上限 1000
98+
PDFRenderDPI int // PDF 页面渲染 DPI。 默认 150,硬上限 600
99+
100+
// PDFiumPool 用于调整分页 PDF 渲染所使用的进程级 PDFium worker pool。
101+
// 仅在首次延迟初始化时生效;后续调用方传入不同 sizing 会触发 WARN 日志,沿用已有 pool。
102+
PDFiumPool PDFiumPoolConfig
103+
104+
// PDFiumAcquireTimeout 限制调用方 ctx 无 deadline 时获取 pdfium worker 的等待上限。
105+
// 是 per-read 配置(不同调用方可使用不同值)。默认 30s。
106+
PDFiumAcquireTimeout time.Duration
107+
}
108+
109+
type PDFiumPoolConfig struct {
110+
MinIdle int // 保持存活的最小空闲 worker 数。 默认 1
111+
MaxIdle int // 保持存活的最大空闲 worker 数。 默认 2
112+
MaxTotal int // 最大 worker 数(>= 2)。 默认 max(2, NumCPU)
93113
}
94114
```
95115

adk/backend/agentkit/go.mod

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,11 @@
11
module github.com/cloudwego/eino-ext/adk/backend/agentkit
22

3-
go 1.23.0
4-
5-
toolchain go1.23.12
3+
go 1.25.0
64

75
require (
86
github.com/bytedance/sonic v1.15.0
97
github.com/cloudwego/eino v0.9.1
10-
github.com/gen2brain/go-fitz v1.24.15
8+
github.com/klippa-app/go-pdfium v1.19.3
119
github.com/slongfield/pyfmt v0.0.0-20220222012616-ea85ff4c361f
1210
github.com/stretchr/testify v1.11.1
1311
)
@@ -21,12 +19,11 @@ require (
2119
github.com/cloudwego/base64x v0.1.6 // indirect
2220
github.com/davecgh/go-spew v1.1.1 // indirect
2321
github.com/dustin/go-humanize v1.0.1 // indirect
24-
github.com/ebitengine/purego v0.8.4 // indirect
2522
github.com/eino-contrib/jsonschema v1.0.3 // indirect
2623
github.com/google/uuid v1.6.0 // indirect
2724
github.com/goph/emperror v0.17.2 // indirect
25+
github.com/jolestar/go-commons-pool/v2 v2.1.2 // indirect
2826
github.com/json-iterator/go v1.1.12 // indirect
29-
github.com/jupiterrider/ffi v0.5.0 // indirect
3027
github.com/klauspost/cpuid/v2 v2.2.9 // indirect
3128
github.com/kr/pretty v0.2.0 // indirect
3229
github.com/mailru/easyjson v0.7.7 // indirect
@@ -37,13 +34,15 @@ require (
3734
github.com/pkg/errors v0.9.1 // indirect
3835
github.com/pmezard/go-difflib v1.0.0 // indirect
3936
github.com/sirupsen/logrus v1.9.3 // indirect
37+
github.com/tetratelabs/wazero v1.11.0 // indirect
4038
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
4139
github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
4240
github.com/yargevad/filepathx v1.0.0 // indirect
4341
golang.org/x/arch v0.11.0 // indirect
44-
golang.org/x/crypto v0.32.0 // indirect
4542
golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1 // indirect
46-
golang.org/x/sys v0.33.0 // indirect
43+
golang.org/x/net v0.53.0 // indirect
44+
golang.org/x/sys v0.43.0 // indirect
45+
golang.org/x/text v0.37.0 // indirect
4746
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 // indirect
4847
gopkg.in/yaml.v3 v3.0.1 // indirect
4948
)

0 commit comments

Comments
 (0)