Skip to content

Commit a8f7c8c

Browse files
authored
Add Windows-specific setup notes (#301)
1 parent 849dbf7 commit a8f7c8c

4 files changed

Lines changed: 133 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ brew install cairo pango gdk-pixbuf libffi
6666
uv pip install weasyprint
6767
```
6868

69+
You can also run MMORE on Windows by following our [Windows setup notes](docs/source/getting_started/windows.md).
70+
6971
#### Step 1 – Install MMORE
7072

7173
Dependencies are split by pipeline stage. Install only what you need:

docs/source/getting_started/installation.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,3 +216,4 @@ For a manual non-Docker setup, use either the standard installation or the `uv`
216216
- [Quickstart](quickstart.md)
217217
- [Process](process.md)
218218
- [uv workflow](../advanced_usage/uv.md)
219+
- [Running on Windows](windows.md) — what differs on Windows and how to fix it
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# 🪟 Running MMORE on Windows
2+
3+
## Overview
4+
5+
MMORE was developed and tested mainly on Linux. It runs on Windows too, but a few things behave differently. This page lists those differences and the fix for each one.
6+
7+
If you work on Linux or macOS, you can skip this page.
8+
9+
## 1. Install the prerequisites
10+
11+
Unlike most Linux distributions, Windows does not ship Python, Git, or FFmpeg.
12+
Install them first with
13+
[winget](https://learn.microsoft.com/windows/package-manager/winget/):
14+
15+
```powershell
16+
winget install Python.Python.3.11
17+
winget install Git.Git
18+
winget install astral-sh.uv
19+
winget install Gyan.FFmpeg
20+
```
21+
22+
Then clone the repo and install MMORE into a virtual environment:
23+
24+
```powershell
25+
git clone https://github.com/swiss-ai/mmore.git
26+
cd mmore
27+
uv venv
28+
.venv\Scripts\activate
29+
uv pip install -e ".[all,cu126]"
30+
```
31+
32+
Use `cu126` for an NVIDIA GPU, or `cpu` otherwise. See the
33+
[README](https://github.com/swiss-ai/mmore#step-1--install-mmore) for the full
34+
list of extras.
35+
36+
## 2. `milvus-lite` is not available on Windows
37+
38+
Every example config whose `db.uri` is `./proc_demo.db` relies on `milvus-lite`
39+
(`examples/index/config.yaml`, `examples/retriever_api/config.yaml`,
40+
`examples/rag/config.yaml`, `examples/rag/config_api.yaml`). There is no Windows
41+
build of `milvus-lite`, so any of them fails with:
42+
43+
```
44+
ModuleNotFoundError: No module named 'milvus_lite'
45+
```
46+
47+
### Fix: run Milvus in Docker
48+
49+
This repo ships no Compose file, so download the official Milvus standalone one
50+
matching your installed `pymilvus` version (see the
51+
[Milvus install docs](https://milvus.io/docs/install_standalone-docker-compose.md)):
52+
53+
```powershell
54+
# Download the Milvus docker compose file from GitHub
55+
Invoke-WebRequest `
56+
-Uri "https://github.com/milvus-io/milvus/releases/download/v2.6.6/milvus-standalone-docker-compose.yml" `
57+
-OutFile "milvus-docker-compose.yml"
58+
# Start Milvus containers
59+
docker compose -f milvus-docker-compose.yml up -d
60+
```
61+
62+
Wait about a minute, then check `docker ps` shows the three containers
63+
(`etcd`, `minio`, `milvus-standalone`) as `(healthy)`.
64+
65+
### Create the database
66+
67+
MMORE does not create the database automatically when connecting to a remote Milvus. Run this once:
68+
69+
```powershell
70+
python -c "from pymilvus import connections, db; connections.connect(uri='http://127.0.0.1:19530'); db.create_database('my_db')"
71+
```
72+
73+
### Point the configs at the Docker instance
74+
75+
The `db` block lives at a different level depending on the config. Change
76+
`uri: ./proc_demo.db` to `uri: http://127.0.0.1:19530` in each one you use.
77+
78+
`examples/retriever_api/config.yaml` (and `examples/rag/config*.yaml`) — `db`
79+
is at the root:
80+
81+
```yaml
82+
db:
83+
uri: http://127.0.0.1:19530
84+
name: my_db
85+
```
86+
87+
`examples/index/config.yaml` — `db` is nested under `indexer`:
88+
89+
```yaml
90+
indexer:
91+
db:
92+
uri: http://127.0.0.1:19530
93+
name: my_db
94+
```
95+
96+
### Check that the setup works
97+
98+
Once Milvus is running, confirm the connection:
99+
100+
```powershell
101+
python -c "from pymilvus import MilvusClient; c = MilvusClient(uri='http://127.0.0.1:19530', db_name='my_db'); print(c.list_collections())"
102+
```
103+
104+
This returns a list of collections (empty before you index anything).
105+
106+
## 3. Surya OCR can crash the process on large PDFs
107+
108+
When processing large PDFs, the surya-based OCR may crash with:
109+
110+
```
111+
Process finished with exit code 0xC0000005
112+
```
113+
114+
This is a hard crash inside a native dependency. On Windows, use the fast processors instead, which rely on PyMuPDF rather than surya.
115+
116+
In your `process` config, `use_fast_processors` goes under `dispatcher_config`:
117+
118+
```yaml
119+
dispatcher_config:
120+
use_fast_processors: true
121+
```
122+
123+
You lose some accuracy on heavily scanned PDFs, but the pipeline no longer crashes.
124+
125+
## See also
126+
127+
- [Installation](installation.md)
128+
- [Quickstart](quickstart.md)

docs/source/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ getting_started/architecture
4141
getting_started/process
4242
getting_started/indexing
4343
getting_started/rag
44+
getting_started/windows
4445
```
4546

4647
```{toctree}
@@ -75,6 +76,7 @@ developer_documentation/index_api
7576
Here is a quick overview of the main pages:
7677

7778
- [Installation](getting_started/installation.md): set up MMORE and prepare your environment
79+
- [Running on Windows](getting_started/windows.md): what differs on Windows and how to fix it
7880
- [Quickstart](getting_started/quickstart.md): run a first minimal workflow end to end
7981
- [Architecture](getting_started/architecture.md): understand the main system components and how they interact
8082
- [Processing pipeline](getting_started/process.md): understand how documents are ingested and transformed

0 commit comments

Comments
 (0)