Skip to content

Commit 0ad5be3

Browse files
authored
Merge pull request #2 from Kitware/add-vector-search
Add vector search capabilities
2 parents e2a5ff1 + 6518848 commit 0ad5be3

22 files changed

Lines changed: 1150 additions & 32 deletions

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ jobs:
4343
- name: Run Flake8
4444
run: |
4545
source venv/bin/activate
46-
flake8
46+
flake8 src/
4747
4848
actionlint:
4949
runs-on: ubuntu-latest

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,5 @@ venv.bak/
7070
ehthumbs.db
7171
Thumbs.db
7272
CLAUDE.md
73+
db/
74+
vtk-examples.json

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "rag-components"]
2+
path = rag-components
3+
url = git@gitlab.kitware.com:christos.tsolakis/rag-components.git

Dockerfile

Lines changed: 0 additions & 28 deletions
This file was deleted.

README.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,50 @@ vtk-mcp-client --host localhost --port 8000 info-cpp vtkActor
5151

5252
## MCP Tools
5353

54-
The server provides three MCP tools:
54+
The server provides four MCP tools:
5555
- `get_vtk_class_info_cpp(class_name)` - Get detailed C++ documentation for a VTK class from online documentation
5656
- `get_vtk_class_info_python(class_name)` - Get Python API documentation using help() function
5757
- `search_vtk_classes(search_term)` - Search for VTK classes containing a term
58+
- `vector_search_vtk_examples(query)` - Search VTK examples using vector similarity (requires embeddings database)
59+
60+
## Vector Search with RAG
61+
62+
The server supports semantic search over VTK Python examples using vector embeddings. This requires the embeddings database.
63+
64+
### Downloading the Embeddings Database
65+
66+
The pre-built embeddings database is available as a container image on GitHub Container Registry:
67+
68+
```bash
69+
# Using Docker
70+
docker create --name vtk-embeddings ghcr.io/kitware/vtk-mcp/embeddings-database:latest
71+
docker cp vtk-embeddings:/vtk-examples-embeddings.tar.gz .
72+
docker rm vtk-embeddings
73+
74+
# Using Podman
75+
podman create --name vtk-embeddings ghcr.io/kitware/vtk-mcp/embeddings-database:latest
76+
podman cp vtk-embeddings:/vtk-examples-embeddings.tar.gz .
77+
podman rm vtk-embeddings
78+
79+
# Extract the database
80+
tar -xzf vtk-examples-embeddings.tar.gz
81+
```
82+
83+
### Using Vector Search
84+
85+
After downloading and extracting the database, start the server with the database path:
86+
87+
```bash
88+
# Install RAG dependencies
89+
pip install -r rag-components/requirements.txt
90+
91+
# Start server with vector search enabled
92+
vtk-mcp-server --transport http --database-path ./db/vtk-examples
93+
94+
# Use vector search with the client
95+
vtk-mcp-client vector-search "render a sphere"
96+
vtk-mcp-client vector-search "read DICOM files" --top-k 10
97+
```
5898

5999
## Docker
60100

deploy.Dockerfile

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
LABEL org.opencontainers.image.title="VTK MCP Server with Embeddings"
2+
LABEL org.opencontainers.image.description="Model Context Protocol server for VTK with vector search embeddings"
3+
LABEL org.opencontainers.image.source="https://github.com/kitware/vtk-mcp"
4+
LABEL org.opencontainers.image.authors="Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>"
5+
LABEL org.opencontainers.image.licenses="MIT"
6+
LABEL org.opencontainers.image.documentation="https://github.com/kitware/vtk-mcp/blob/main/README.md"
7+
8+
FROM python:3.12-slim AS embeddings
9+
10+
# Download embeddings database from GHCR
11+
COPY --from=ghcr.io/kitware/vtk-mcp/embeddings-database:latest /vtk-examples-embeddings.tar.gz /tmp/
12+
13+
# Extract the database
14+
RUN mkdir -p /app/db && \
15+
tar -xzf /tmp/vtk-examples-embeddings.tar.gz -C /app/db && \
16+
rm /tmp/vtk-examples-embeddings.tar.gz
17+
18+
FROM python:3.12-slim
19+
20+
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
21+
PIP_NO_CACHE_DIR=1 \
22+
PYTHONDONTWRITEBYTECODE=1 \
23+
PYTHONUNBUFFERED=1
24+
25+
# Install system dependencies for VTK
26+
RUN apt update && \
27+
apt install --no-install-recommends --no-install-suggests -y \
28+
libgl1-mesa-dev \
29+
libxrender-dev/stable && \
30+
rm -rf /var/lib/apt/lists/*
31+
32+
WORKDIR /app
33+
34+
# Copy application code
35+
COPY . .
36+
37+
# Copy embeddings database from first stage
38+
COPY --from=embeddings /app/db /app/db
39+
40+
# Install Python dependencies (including RAG dependencies)
41+
RUN pip install --upgrade pip && \
42+
pip install --verbose . && \
43+
pip install -r rag-components/requirements.txt
44+
45+
EXPOSE 8000
46+
47+
# Start server with database path configured
48+
CMD ["vtk-mcp-server", "--transport", "http", "--host", "0.0.0.0", "--port", "8000", "--database-path", "/app/db/vtk-examples"]

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ markers = [
6060
"integration: Integration tests that require server/client interaction",
6161
"http: HTTP transport integration tests",
6262
"stdio: Stdio transport integration tests",
63+
"vector_search: Vector search integration tests (requires podman and embeddings database)",
6364
"slow: Tests that take longer to run",
6465
]
6566
filterwarnings = [

rag-components/.gitignore

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# OS files
2+
.DS_Store
3+
4+
# Byte-compiled / optimized / DLL files
5+
__pycache__/
6+
*.py[cod]
7+
*$py.class
8+
9+
# Environments
10+
.env
11+
.venv
12+
env/
13+
venv/

rag-components/LICENSE

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Copyright 2025 Kitware Inc.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.

rag-components/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# A simple RAG for VTK
2+
3+
This project creates a database out of the existing Python examples of VTK and allows to ask questions related to VTK.
4+
5+
## Set up
6+
1. By default it uses the OpenAI API. Make sure you get an API key and set
7+
your environmental variable appropriately. To use other an other model see
8+
[below](#supported-llm-models).
9+
10+
2. Get the code of the vtk-examples. We will use this to generate our database.
11+
12+
```bash
13+
git clone https://gitlab.kitware.com/vtk/vtk-examples
14+
```
15+
16+
3. Create a virtual environment and install the dependencies.
17+
18+
```bash
19+
git clone https://gitlab.kitware.com/vtk/vtk-examples
20+
python -m venv env
21+
source env/bin/activate
22+
pip install -r requirements.txt
23+
```
24+
25+
4. Populate the database. This is required only once or if you want to experiment with a different embedding function.
26+
It will take some time depending on the hardware you are using.
27+
28+
```bash
29+
python populate_db.py --dir ./vtk-examples/src/Python
30+
```
31+
32+
5. Now ask your question !
33+
34+
```bash
35+
$ python chat.py --database ./db/codesage-codesage-large-v2
36+
User: How to read a vti file
37+
To read a VTK image data file (.vti), you can use the `vtkXMLImageDataReader` class. Here is a basic example:
38+
39+
import vtk
40+
41+
# Create a reader for your vti file
42+
reader = vtk.vtkXMLImageDataReader()
43+
reader.SetFileName('your_file.vti')
44+
reader.Update()
45+
46+
# The output of reader.GetOutput() is your vtkImageData object
47+
image_data = reader.GetOutput()
48+
49+
In this code, replace `'your_file.vti'` with the path to your .vti file. The
50+
`reader.Update()` call is necessary to actually perform the reading operation.
51+
After this, you can use `reader.GetOutput()` to get the `vtkImageData` object
52+
that was read from the file.
53+
54+
References:
55+
https://examples.vtk.org/site/Python/Medical/GenerateModelsFromLabels
56+
https://examples.vtk.org/site/Python/ImageData/WriteReadVtkImageData
57+
...
58+
```
59+
60+
### Supported LLM models
61+
`chat.py` uses by default "gpt-4" model to switch to a different one pass the name of the model via the `--model=<model name>` parameter.
62+
Currently supported models:
63+
- OpenAI models. See exact model names [here](https://platform.openai.com/docs/models#current-model-aliases). To use them you need an OpenAI API [key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).
64+
- Anthropic models. See exact names [here](https://docs.anthropic.com/en/docs/about-claude/models/all-models#model-names). To use them you need an Anthropic API [key](https://docs.anthropic.com/en/api/getting-started#accessing-the-api).
65+
- Models supported by the Ollama framework. To use these models make sure you have [ollama](https://github.com/ollama/ollama) installed and that it
66+
is running in another terminal (via `ollama serve`) and the you have already
67+
pulled the model you want to use (via `ollama pull <model-name>`). You can find available models [here](https://ollama.com/).

0 commit comments

Comments
 (0)