Skip to content

Commit 04d6eea

Browse files
committed
Merge remote-tracking branch 'origin/master' into v2
2 parents 7f3f77d + 6931183 commit 04d6eea

22 files changed

Lines changed: 11064 additions & 4933 deletions

.github/workflows/pyright.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,15 @@ jobs:
3232
run: |
3333
source .venv/bin/activate
3434
pyright
35+
36+
- name: Install dependencies - colvision
37+
run: |
38+
uv venv .venv-colvision
39+
source .venv-colvision/bin/activate
40+
uv pip install -e ".[colvision,cpu,dev]"
41+
42+
- name: Run Pyright - colvision
43+
continue-on-error: true
44+
run: |
45+
source .venv-colvision/bin/activate
46+
pyright src/mmore/colvision

.github/workflows/tests.yml

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,25 +11,27 @@ jobs:
1111
runs-on: ubuntu-latest
1212

1313
strategy:
14+
fail-fast: false
1415
matrix:
1516
python-version: ["3.11", "3.12"]
1617

18+
name: test (py${{ matrix.python-version }})
19+
1720
steps:
1821
- name: Checkout code
1922
uses: actions/checkout@v6
2023

21-
- name: Install uv and create venv
22-
run: |
23-
pipx install uv
24-
uv venv .venv
24+
- name: Install uv
25+
run: pipx install uv
2526

2627
- name: Set up Python ${{ matrix.python-version }}
2728
uses: actions/setup-python@v6
2829
with:
2930
python-version: ${{ matrix.python-version }}
3031

31-
- name: Install dependencies (using uv)
32+
- name: Install dependencies - process (using uv)
3233
run: |
34+
uv venv .venv
3335
source .venv/bin/activate
3436
uv pip install -e ".[process,index,rag,api,cpu,dev,websearch]"
3537
@@ -39,7 +41,18 @@ jobs:
3941
uv pip show cohere || echo "Cohere not installed"
4042
uv pip show langchain-cohere || echo "Langchain-cohere not installed"
4143
42-
- name: Run tests
44+
- name: Run tests - process
4345
run: |
4446
source .venv/bin/activate
45-
pytest
47+
pytest --ignore=tests/test_colvision.py
48+
49+
- name: Install dependencies - colvision
50+
run: |
51+
uv venv .venv-colvision
52+
source .venv-colvision/bin/activate
53+
uv pip install -e ".[colvision,cpu,dev]"
54+
55+
- name: Run tests - colvision
56+
run: |
57+
source .venv-colvision/bin/activate
58+
pytest tests/test_colvision.py
Lines changed: 76 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,49 @@
1-
# 🖼️ ColPali Integration
1+
# 🖼️ ColVision Integration
22

3-
## Overview
3+
PDF retrieval pipeline using ColVision embeddings, stored in Milvus.
44

5-
This module provides a complete pipeline for processing PDF documents with ColPali embeddings, storing them in a Milvus vector database, and performing semantic search.
5+
## Installation
66

7-
It is designed for efficient document retrieval and RAG applications.
7+
The `[colvision]` extra is mutually exclusive with `[process]` — use a dedicated venv.
8+
9+
```bash
10+
uv sync --extra colvision
11+
```
12+
13+
## Supported Models
14+
15+
| Model | `model_name` |
16+
|---|---|
17+
| ColPali v1.3 | `vidore/colpali-v1.3` |
18+
| ColQwen2 v1.0 | `vidore/colqwen2-v1.0` |
19+
| ColQwen2.5 v0.2 | `vidore/colqwen2.5-v0.2` |
20+
| ColGemma3 | `Cognitive-Lab/ColNetraEmbed` |
21+
| ColSmol 256M | `vidore/colSmol-256M` |
22+
| ColSmol 500M | `vidore/colSmol-500M` |
23+
24+
All models are installed with the single `[colvision]` extra.
25+
26+
The model/processor class is auto-detected from `model_name`, and the embedding dimension is inferred at every stage (from the loaded model at `process` / `retrieve` time, from the parquet contents at `index` time).
27+
28+
## Choosing a Model
29+
30+
Set `model_name` in the YAML config, or override it via the `-m` / `--model` CLI flag on the `process` and `retrieve` commands.
31+
32+
The pipeline runs in three steps — `process`, then `index`, then `retrieve` — and the
33+
`-m` / `--model` flag must be passed to both `process` and `retrieve`:
34+
35+
```bash
36+
# 1. Process PDFs into embeddings
37+
python3 -m mmore colvision process --config-file examples/colvision/config_process.yml -m vidore/colqwen2.5-v0.2
38+
39+
# 2. Index the embeddings into Milvus (no model needed here)
40+
python3 -m mmore colvision index --config-file examples/colvision/config_index.yml
41+
42+
# 3. Retrieve with the same model used at processing time
43+
python3 -m mmore colvision retrieve --config-file examples/colvision/config_retrieval.yml -m vidore/colqwen2.5-v0.2
44+
```
45+
46+
> **Important:** the same model must be used across `process` and `retrieve` — mixing produces incorrect results.
847
948
## 🧭 Architecture
1049

@@ -17,28 +56,32 @@ The system consists of three main components:
1756
## 📁 File Structure
1857

1958
```
20-
src/mmore/colpali/
21-
├── milvuscolpali.py # Milvus database management
59+
src/mmore/colvision/
60+
├── model_utils.py # Model/processor class resolution
61+
├── milvuscolvision.py # Milvus database management
2262
├── run_index.py # Indexing pipeline
23-
├── run_process.py # PDF processing pipeline
63+
├── run_process.py # PDF processing pipeline
2464
├── run_retriever.py # Search and retrieval API
25-
└── retriever.py # ColPaliRetriever class for RAG integration
65+
└── retriever.py # ColVisionRetriever class for RAG integration
2666
```
2767

2868
## 🚀 Quick Start
2969

3070
### 1. Process PDFs into embeddings
3171

3272
```bash
33-
python3 -m mmore colpali process --config-file examples/colpali/config_process.yml
73+
python3 -m mmore colvision process --config-file examples/colvision/config_process.yml
74+
75+
# Or override the model from the command line
76+
python3 -m mmore colvision process --config-file examples/colvision/config_process.yml --model vidore/colqwen2.5-v0.2
3477
```
3578

3679
**Example config (`config_process.yml`):**
3780
```yaml
3881
data_path:
3982
- 'examples/sample_data/pdf'
4083
output_path: "./output"
41-
model_name: "vidore/colpali-v1.3"
84+
model_name: "vidore/colqwen2.5-v0.2"
4285
skip_already_processed: true
4386
num_workers: 5
4487
batch_size: 8
@@ -47,7 +90,7 @@ batch_size: 8
4790
### 2. Index embeddings into Milvus
4891
4992
```bash
50-
python3 -m mmore colpali index --config-file examples/colpali/config_index.yml
93+
python3 -m mmore colvision index --config-file examples/colvision/config_index.yml
5194
```
5295

5396
**Example config (`config_index.yml`):**
@@ -57,7 +100,6 @@ milvus:
57100
db_path: ./output/milvus_data.db
58101
collection_name: pdf_pages
59102
create_collection: true
60-
dim: 128
61103
metric_type: IP
62104
```
63105
@@ -66,47 +108,31 @@ milvus:
66108
#### Retrieval Server Mode
67109
```bash
68110
# Start the retrieval API server
69-
python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieval.yml
111+
python3 -m mmore colvision retrieve --config-file examples/colvision/config_retrieval.yml
70112
```
71113

72114
Or with a custom host and port:
73115
```bash
74-
python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieval.yml --host 0.0.0.0 --port 8001
116+
python3 -m mmore colvision retrieve --config-file examples/colvision/config_retrieval.yml --host 0.0.0.0 --port 8001
75117
```
76118

77119
**Example config (`config_retrieval.yml`):**
78120
```yaml
79-
db_path: "./milvus_data"
121+
db_path: "./output/milvus_data.db"
80122
collection_name: "pdf_pages"
81-
model_name: "vidore/colpali-v1.3"
123+
model_name: "vidore/colqwen2.5-v0.2"
82124
top_k: 3
83-
dim: 128
84-
max_workers: 16
85125
metric_type: "IP"
126+
max_workers: 16
86127
text_parquet_path: "./output/pdf_page_text.parquet"
87128
```
88129
89-
#### Single Query Mode
90-
```bash
91-
# Run retrieval for a single query defined in the config file
92-
python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieval_single.yml
93-
```
94-
95-
**Example config (`config_retrieval_single.yml`):**
96-
```yaml
97-
mode: "single"
98-
db_path: "./milvus_data"
99-
collection_name: "pdf_pages"
100-
model_name: "vidore/colpali-v1.3"
101-
query: "What may lead to dysbiosis and inflammation?"
102-
top_k: 5
103-
```
104130
Host and port are specified via CLI flags (`--host` and `--port`), not in the config file.
105131

106132
#### Batch Mode
107133
```bash
108134
# Process queries from file
109-
python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieval.yml --input-file queries.jsonl --output-file results.json
135+
python3 -m mmore colvision retrieve --config-file examples/colvision/config_retrieval.yml --input-file queries.jsonl --output-file results.json
110136
```
111137

112138
**Example queries file (`queries.jsonl`):**
@@ -119,20 +145,9 @@ Each line should be a JSON-encoded string (one query per line):
119145

120146
Each line must be a valid JSON string, including quotes, since the file is parsed line by line with `json.loads()`.
121147

122-
**Example config (`config_retrieval.yml`):**
123-
```yaml
124-
db_path: "./milvus_data"
125-
collection_name: "pdf_pages"
126-
model_name: "vidore/colpali-v1.3"
127-
top_k: 5
128-
dim: 128
129-
max_workers: 16
130-
text_parquet_path: "./output/pdf_page_text.parquet"
131-
```
132-
133148
## 🔧 Core Components
134149

135-
### MilvusColpaliManager
150+
### MilvusColvisionManager
136151
- manages local Milvus database operations
137152
- handles collection creation and indexing
138153
- provides efficient batch insertion
@@ -146,14 +161,14 @@ text_parquet_path: "./output/pdf_page_text.parquet"
146161

147162
### PDF Processor
148163
- converts PDF pages to images
149-
- generates ColPali embeddings
164+
- generates ColVision embeddings
150165
- handles parallel processing
151166
- supports stop-and-resume workflows for large datasets
152167

153168
**Processing Flow:**
154169
1. Crawl PDF files from specified directories
155170
2. Convert each page to high-resolution PNG
156-
3. Generate embeddings using ColPali model
171+
3. Generate embeddings using the configured model
157172
4. Store results in Parquet format
158173

159174
### Retriever
@@ -193,28 +208,25 @@ curl -X POST "http://localhost:8001/v1/retrieve" \
193208

194209
### RAG Pipeline Integration
195210
```python
196-
from mmore.colpali.retriever import ColPaliRetriever, ColPaliRetrieverConfig
197-
from mmore.rag.pipeline import RAGPipeline, RAGConfig
211+
from mmore.colvision.retriever import ColVisionRetriever, ColVisionRetrieverConfig
198212
199-
# Create ColPali retriever with text support
200-
colpali_config = ColPaliRetrieverConfig(
213+
config = ColVisionRetrieverConfig(
201214
db_path="./output/milvus_data.db",
202215
collection_name="pdf_pages",
203-
model_name="vidore/colpali-v1.3",
216+
model_name="vidore/colqwen2.5-v0.2",
204217
text_parquet_path="./output/pdf_page_text.parquet",
205218
top_k=3,
206-
dim=128,
207219
max_workers=16,
208220
metric_type="IP",
209221
)
210-
colpali_retriever = ColPaliRetriever.from_config(colpali_config)
222+
retriever = ColVisionRetriever.from_config(config)
211223
212224
# Use with RAG pipeline (requires LLM config)
213-
# rag_config = RAGConfig(retriever=colpali_retriever, ...)
225+
# rag_config = RAGConfig(retriever=retriever, ...)
214226
# rag_pipeline = RAGPipeline.from_config(rag_config)
215227
```
216228

217-
The `ColPaliRetriever` is a LangChain-compatible `BaseRetriever` that returns `Document` objects with:
229+
The `ColVisionRetriever` is a LangChain-compatible `BaseRetriever` that returns `Document` objects with:
218230
- `page_content`: the text content from the PDF page, if `text_parquet_path` is provided
219231
- `metadata`: contains `pdf_name`, `pdf_path`, `page_number`, `rank`, and `similarity` score
220232

@@ -282,13 +294,13 @@ The `ColPaliRetriever` is a LangChain-compatible `BaseRetriever` that returns `D
282294
### Complete Workflow
283295
```bash
284296
# 1. Process all PDFs in a directory
285-
python3 -m mmore colpali process --config-file examples/colpali/config_process.yml
297+
python3 -m mmore colvision process --config-file examples/colvision/config_process.yml
286298
287299
# 2. Index the embeddings
288-
python3 -m mmore colpali index --config-file examples/colpali/config_index.yml
300+
python3 -m mmore colvision index --config-file examples/colvision/config_index.yml
289301
290302
# 3. Start the API server
291-
python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieval.yml
303+
python3 -m mmore colvision retrieve --config-file examples/colvision/config_retrieval.yml
292304
293305
# 4. Query the system
294306
curl -X POST "http://localhost:8001/v1/retrieve" \
@@ -299,13 +311,13 @@ curl -X POST "http://localhost:8001/v1/retrieve" \
299311
### Alternative: Batch processing
300312
```bash
301313
# 1. Process PDFs (same as above)
302-
python3 -m mmore colpali process --config-file examples/colpali/config_process.yml
314+
python3 -m mmore colvision process --config-file examples/colvision/config_process.yml
303315
304316
# 2. Index embeddings (same as above)
305-
python3 -m mmore colpali index --config-file examples/colpali/config_index.yml
317+
python3 -m mmore colvision index --config-file examples/colvision/config_index.yml
306318
307319
# 3. Run batch retrieval
308-
python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieval.yml \
320+
python3 -m mmore colvision retrieve --config-file examples/colvision/config_retrieval.yml \
309321
--input-file queries.jsonl \
310322
--output-file results.json
311323
```
@@ -319,7 +331,7 @@ python3 -m mmore colpali retrieve --config-file examples/colpali/config_retrieva
319331
### For better accuracy
320332
- use higher DPI in PDF conversion, default is 200
321333
- increase `top_k` in retrieval to inspect more candidate pages
322-
- consider using larger ColPali models if available
334+
- consider using more recent ColVision models (ColQwen2.5, ColGemma3)
323335

324336
### For production
325337
- run Milvus in distributed mode for larger datasets

docs/source/getting_started/architecture.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ MMORE is designed as a multimodal ingestion and retrieval framework for heteroge
1616
MMORE is organized around three main executable stages:
1717

1818
- `run_process`, which handles ingestion, crawling, dispatching, and document processing
19-
- `run_indexer`, which builds the searchable index and can integrate multimodal retrieval components such as ColPali
19+
- `run_indexer`, which builds the searchable index and can integrate multimodal retrieval components such as ColVision models
2020
- `run_rag`, which serves retrieval and RAG workflows through interfaces such as the API and CLI
2121

2222
These stages interact with intermediate outputs, the vector database, and optional external components such as hosted LLM endpoints, WebRAG, or Live RAG.
@@ -104,9 +104,9 @@ That means the framework may work with:
104104
- plain text documents
105105
- structured metadata
106106
- images or layout-aware representations
107-
- multimodal retrieval models such as ColPali-related components
107+
- multimodal retrieval models such as ColVision-related components
108108

109-
See [ColPali](../core_features/colpali.md) for the multimodal retrieval side.
109+
See [ColVision](../core_features/colvision.md) for the multimodal retrieval side.
110110

111111
### 6. Distributed execution
112112

docs/source/getting_started/installation.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,8 +102,14 @@ cd mmore
102102

103103
### Step 4: Install the project and dependencies
104104

105+
Pick the pipeline you intend to use. The standard pipeline (`[process]`, with text/document extraction) and the ColVision pipeline (`[colvision]`, with vision-based RAG) are **mutually exclusive** — set up a separate venv if you need both.
106+
105107
```bash
106-
uv sync
108+
# Standard pipeline (document processing + text RAG)
109+
uv sync --extra process --extra rag
110+
111+
# OR ColVision pipeline (vision-based RAG: ColPali, ColQwen2/2.5, ColGemma3, ColSmol)
112+
uv sync --extra colvision
107113
```
108114

109115
For a CPU-only installation, use:

0 commit comments

Comments
 (0)