Skip to content

Commit 033b9e2

Browse files
authored
[OpenCLIP] Add ONNX export tooling and runbook (#69)
* feat(openclip): add ONNX export tooling and runbook * fix(openclip): address PR portability and auth feedback * chore(openclip): apply low-priority exporter feedback
1 parent d2cb214 commit 033b9e2

5 files changed

Lines changed: 816 additions & 0 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@ vendor/
4343

4444
# Environment files
4545
.env
46+
.env.*
47+
!.env.example
48+
!.env.sample
4649

4750
# OS specific files
4851
.DS_Store

README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,51 @@ func main() {
219219
}
220220
```
221221

222+
### OpenCLIP ONNX Export Tooling (`tools/openclip_export_onnx.py`)
223+
224+
To generate pinned OpenCLIP ONNX artifacts (split text + vision encoders):
225+
226+
Detailed runbook: [`docs/openclip-export.md`](docs/openclip-export.md)
227+
228+
```bash
229+
pip install torch transformers huggingface_hub numpy onnxruntime==1.23.1
230+
231+
python3 ./tools/openclip_export_onnx.py \
232+
--output-dir ./build/openclip-vit-b-32-laion2b-s34b-b79k-onnx
233+
```
234+
235+
Defaults are pinned to:
236+
- model: `laion/CLIP-ViT-B-32-laion2B-s34B-b79K`
237+
- revision: `1a25a446712ba5ee05982a381eed697ef9b435cf`
238+
239+
The export writes:
240+
- `text_model.onnx`
241+
- `vision_model.onnx`
242+
- `config.json`
243+
- `tokenizer.json`
244+
- `tokenizer_config.json`
245+
- `preprocessor_config.json`
246+
- `manifest.json` (SHA256 + export metadata)
247+
248+
Optional copied files when available upstream:
249+
- `special_tokens_map.json`
250+
- `open_clip_config.json`
251+
- `vocab.json`
252+
- `merges.txt`
253+
- `README.md`
254+
255+
If `onnxruntime` is unavailable in your Python env, add `--skip-verify`.
256+
257+
Optional: upload generated artifacts directly to Hugging Face Hub:
258+
259+
```bash
260+
export HF_TOKEN=<your_hf_token>
261+
262+
python3 ./tools/openclip_export_onnx.py \
263+
--output-dir ./build/openclip-vit-b-32-laion2b-s34b-b79k-onnx \
264+
--push-to-hub-repo <your-org-or-user>/<repo-name>
265+
```
266+
222267
## Project Status
223268

224269
This project is under active development. See our [GitHub Issues](https://github.com/amikos-tech/pure-onnx/issues) for the development roadmap.

docs/openclip-export.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# OpenCLIP Export Runbook
2+
3+
This runbook documents how to export a pinned OpenCLIP checkpoint into ONNX
4+
artifacts used by this repository.
5+
6+
The tooling lives at:
7+
- `tools/openclip_export_onnx.py`
8+
9+
## Scope
10+
11+
The exporter produces a split ONNX contract:
12+
- `text_model.onnx` (`input_ids`, `attention_mask` -> `text_embeds`)
13+
- `vision_model.onnx` (`pixel_values` -> `image_embeds`)
14+
15+
It also copies model metadata from Hugging Face and writes a checksum manifest.
16+
17+
## Default Source Model (Pinned)
18+
19+
- model id: `laion/CLIP-ViT-B-32-laion2B-s34B-b79K`
20+
- revision: `1a25a446712ba5ee05982a381eed697ef9b435cf`
21+
22+
This corresponds to OpenCLIP:
23+
- `model_name="ViT-B-32"`
24+
- `checkpoint="laion2b_s34b_b79k"`
25+
26+
## Prerequisites
27+
28+
Use Python 3.10+ and install:
29+
30+
```bash
31+
pip install -r ./tools/requirements-openclip.txt
32+
# or:
33+
pip install torch transformers huggingface_hub numpy onnxruntime==1.23.1
34+
```
35+
36+
Optional for private/gated models:
37+
- `HF_TOKEN` with read access
38+
39+
Optional for publishing:
40+
- `HF_TOKEN` with write access to the target repo/user/org
41+
- or local cached auth via `huggingface-cli login`
42+
43+
## Export Locally
44+
45+
```bash
46+
python3 ./tools/openclip_export_onnx.py \
47+
--output-dir ./build/openclip-vit-b-32-laion2b-s34b-b79k-onnx \
48+
--clean-output-dir
49+
```
50+
51+
## Publish to Hugging Face Hub
52+
53+
```bash
54+
export HF_TOKEN=<hf_token_with_model_write>
55+
56+
python3 ./tools/openclip_export_onnx.py \
57+
--output-dir ./build/openclip-vit-b-32-laion2b-s34b-b79k-onnx \
58+
--clean-output-dir \
59+
--push-to-hub-repo <org-or-user>/<repo-name>
60+
```
61+
62+
You can also omit `HF_TOKEN` if you have already authenticated locally:
63+
64+
```bash
65+
huggingface-cli login
66+
python3 ./tools/openclip_export_onnx.py \
67+
--output-dir ./build/openclip-vit-b-32-laion2b-s34b-b79k-onnx \
68+
--clean-output-dir \
69+
--push-to-hub-repo <org-or-user>/<repo-name>
70+
```
71+
72+
Optional flags:
73+
- `--push-to-hub-private` to create a private repo if it does not exist.
74+
- `--push-to-hub-revision <branch>` (default: `main`).
75+
- `--push-to-hub-commit-message <message>`.
76+
77+
## Expected Artifact Layout
78+
79+
After a successful run:
80+
- `text_model.onnx`
81+
- `vision_model.onnx`
82+
- `config.json`
83+
- `tokenizer.json`
84+
- `tokenizer_config.json`
85+
- `preprocessor_config.json`
86+
- `manifest.json`
87+
88+
Optional copied files (when present upstream):
89+
- `special_tokens_map.json`
90+
- `open_clip_config.json`
91+
- `vocab.json`
92+
- `merges.txt`
93+
- `README.md`
94+
95+
## Manifest Contract
96+
97+
`manifest.json` includes:
98+
- `schema_version`
99+
- `generated_at_utc`
100+
- `generator`
101+
- source model id + requested/resolved revision
102+
- exporter settings (opset, torch version)
103+
- text/vision model input/output contract and dimensions
104+
- artifact checksum list (path, size, SHA256)
105+
- publish metadata (`repo_id`, `revision`, `base_url`) when push is used
106+
107+
## Verify Published Artifacts
108+
109+
```bash
110+
curl -sL https://huggingface.co/<org-or-user>/<repo-name>/resolve/main/manifest.json
111+
```
112+
113+
Optional: compare local checksum to manifest:
114+
115+
```bash
116+
shasum -a 256 ./build/openclip-vit-b-32-laion2b-s34b-b79k-onnx/text_model.onnx
117+
```
118+
119+
## Troubleshooting
120+
121+
### NumPy / ONNX Runtime ABI mismatch
122+
123+
Symptom:
124+
- `AttributeError: _ARRAY_API not found`
125+
- message about module compiled with NumPy 1.x and running in NumPy 2.x
126+
127+
Fix:
128+
1. Pin ONNX Runtime to `1.23.1` (project baseline):
129+
```bash
130+
pip install --upgrade onnxruntime==1.23.1
131+
```
132+
2. Re-run export.
133+
134+
Workaround:
135+
- add `--skip-verify` to skip ONNX Runtime verification phase.
136+
137+
## Notes
138+
139+
- ONNX export may print `TracerWarning` messages from PyTorch; these are expected
140+
for this export path.
141+
- ONNX conversion is a format conversion only; model license remains upstream
142+
(`MIT` for the default source model).

0 commit comments

Comments
 (0)