Skip to content

Commit bbd34b5

Browse files
authored
Update (#218)
Fix typing errors + better documentation (#214)
1 parent 94ad49e commit bbd34b5

18 files changed

Lines changed: 409 additions & 122 deletions

File tree

.github/workflows/ruff.yml

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,21 @@ jobs:
99
runs-on: ubuntu-latest
1010
steps:
1111
- uses: actions/checkout@v6
12-
- uses: astral-sh/ruff-action@v3
13-
- run: ruff check
14-
- run: ruff format --check
12+
- name: Install dependencies
13+
run: |
14+
pip install pre-commit ruff
15+
16+
# 4️⃣ Run pre-commit on all files
17+
- name: Run pre-commit hooks
18+
run: |
19+
pre-commit run --all-files || (
20+
echo ""
21+
echo "❌ Pre-commit checks failed!"
22+
echo "This project REQUIRES pre-commit to run formatting and lint fixes."
23+
echo ""
24+
echo "To fix locally, run:"
25+
echo " pip install pre-commit"
26+
echo " pre-commit install"
27+
echo " pre-commit run --all-files"
28+
exit 1
29+
)

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,4 +129,6 @@ test*.sh
129129

130130
# Examples
131131
examples/outputs
132-
outputs/
132+
outputs/
133+
134+
paper/

.pre-commit-config.yaml

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,8 @@
11
repos:
2-
- repo: https://github.com/PyCQA/isort
3-
rev: 6.0.1
4-
hooks:
5-
- id: isort
62
- repo: https://github.com/astral-sh/ruff-pre-commit
73
rev: v0.12.11
84
hooks:
9-
- id: ruff-check
10-
args: [
11-
--fix, # auto-fix lint + style issues
12-
--unsafe-fixes, # allows formatting & import sorting
13-
]
5+
- id: ruff
6+
args: ["--fix"]
147

15-
- repo: https://github.com/codespell-project/codespell
16-
rev: v2.4.1
17-
hooks:
18-
- id: codespell # See pyproject.toml for args
19-
additional_dependencies:
20-
- tomli
8+
- id: ruff-format

README.md

Lines changed: 3 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ If `weasyprint` fails to find GTK or Cairo, also run:
5050

5151
```bash
5252
brew install cairo pango gdk-pixbuf libffi
53-
pip install weasyprint
53+
uv pip install weasyprint
5454
```
5555

5656
#### Step 1 – Install MMORE
@@ -61,15 +61,10 @@ To install the latest release of the package, simply run:
6161
uv pip install mmore
6262
```
6363

64-
To install the package for development, simply run:
65-
```bash
66-
uv venv .venv
67-
source .venv/bin/activate
68-
uv pip install -e .
69-
```
70-
7164
> :warning: This package requires many big dependencies and requires a dependency override, so it has to be installed with `uv` to handle `pip` installations. [Check our tutorial on uv](https://github.com/swiss-ai/mmore/blob/master/docs/uv.md).
7265
66+
> :warning: **Check the instructions for contributors directly at [`docs/for_devs.md`](./docs/for_devs.md)**
67+
7368
### Minimal Example
7469

7570
You can use our predefined CLI commands to execute parts of the pipeline. Note that you might need to prepend `python -m` to the command if the package does not properly create bash aliases.
@@ -142,33 +137,6 @@ See [the `/docs` directory](https://github.com/swiss-ai/mmore/blob/master/docs)
142137
| **Media Files** | MP4, MOV, AVI, MKV, MP3, WAV, AAC | GPU/CPU | :white_check_mark:
143138
| **Web Content** | HTML | CPU | :x:
144139

145-
146-
## Contributing
147-
148-
We welcome contributions to improve the current state of the pipeline, feel free to:
149-
150-
- Open an issue to report a bug or ask for a new feature
151-
- Open a pull request to fix a bug or add a new feature
152-
- You can find ongoing new features and bugs in the [Issues]
153-
154-
Don't hesitate to star the project :star: if you find it interesting! (you would be our star).
155-
156-
### To make sure your code is pretty, this repo has a `pre-commit` configuration file that runs linters (`isort`, `black`)
157-
158-
1. Install pre-commit if you haven't already
159-
160-
`uv pip install pre-commit`
161-
162-
2. Set up the git hook scripts
163-
164-
`pre-commit install`
165-
166-
3. Run the checks manually (optional but good before first commit)
167-
168-
`pre-commit run --all-files`
169-
170-
We also use `pyright` to type-check the code base, please make sure your Pull Requests are type-checked.
171-
172140
## License
173141

174142
This project is licensed under the Apache 2.0 License, see the [LICENSE :mortar_board:](LICENSE) file for details.

docs/for_devs.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Developer Documentation
2+
3+
Welcome to the MMORE developer documentation! This guide will help you set up your development environment and contribute to the project.
4+
5+
## Table of Contents
6+
7+
- [Developer Documentation](#developer-documentation)
8+
- [Table of Contents](#table-of-contents)
9+
- [Development Setup](#development-setup)
10+
- [System Dependencies](#system-dependencies)
11+
- [Linux (Ubuntu/Debian)](#linux-ubuntudebian)
12+
- [MacOS](#macos)
13+
- [Installing MMORE for Development](#installing-mmore-for-development)
14+
- [Code Quality-Tools](#code-quality-tools)
15+
- [Pre-commit Hooks](#pre-commit-hooks)
16+
- [Type Checking](#type-checking)
17+
- [Contributing Guidelines](#contributing-guidelines)
18+
- [Reporting Issues](#reporting-issues)
19+
- [Code Contributions](#code-contributions)
20+
- [Project Structure](#project-structure)
21+
- [Testing](#testing)
22+
- [Running tests in the terminal](#running-tests-in-the-terminal)
23+
- [Writing tests](#writing-tests)
24+
- [Pull Request Process](#pull-request-process)
25+
- [PR Checklist](#pr-checklist)
26+
- [Development Tips](#development-tips)
27+
- [Working with UV](#working-with-uv)
28+
- [Questions?](#questions)
29+
30+
---
31+
32+
## Development Setup
33+
34+
### System Dependencies
35+
36+
Before installing MMORE for development, ensure you have the required system dependencies installed.
37+
38+
#### Linux (Ubuntu/Debian)
39+
40+
```bash
41+
sudo apt update
42+
sudo apt install -y ffmpeg libsm6 libxext6 chromium-browser libnss3 \
43+
libgconf-2-4 libxi6 libxrandr2 libxcomposite1 libxcursor1 libxdamage1 \
44+
libxext6 libxfixes3 libxrender1 libasound2 libatk1.0-0 libgtk-3-0 libreoffice \
45+
libpango-1.0-0 libpangoft2-1.0-0 weasyprint
46+
```
47+
48+
> **Note:** Note: On Ubuntu 24.04, replace `libasound2` with `libasound2t64`. You may also need to add the repository for Ubuntu 20.04 focal to have access to a few of the sources (e.g., create `/etc/apt/sources.list.d/mmore.list` with the contents `deb http://cz.archive.ubuntu.com/ubuntu focal main universe`).
49+
50+
#### MacOS
51+
52+
```bash
53+
brew update
54+
brew install ffmpeg chromium gtk+3 pango cairo \
55+
gobject-introspection libffi pkg-config libx11 libxi \
56+
libxrandr libxcomposite libxcursor libxdamage libxext \
57+
libxrender libasound2 atk libreoffice weasyprint
58+
```
59+
60+
If `weasyprint` fails to find GTK or Cairo, also run:
61+
62+
```bash
63+
brew install cairo pango gdk-pixbuf libffi
64+
uv pip install weasyprint
65+
```
66+
67+
### Installing MMORE for Development
68+
69+
**1. Clone the repository:**
70+
71+
```bash
72+
git clone https://github.com/swiss-ai/mmore.git
73+
cd mmore
74+
```
75+
76+
**2. Create a virtual environment and install dependencies:**
77+
78+
```bash
79+
uv venv .venv
80+
source .venv/bin/activate
81+
uv pip install -e .
82+
uv pip install .[dev]
83+
```
84+
85+
> **Important:** This package requires many big dependencies and requires a dependency override, so it must be installed with `uv` to handle `pip` installations. Check our [tutorial on uv](./uv.md) for more information.
86+
87+
### Code Quality-Tools
88+
89+
MMORE uses several tools to maintain code quality and consistency.
90+
91+
#### Pre-commit Hooks
92+
93+
We use `pre-commit` to automatically run code formatters and linters before each commit.
94+
95+
**Setup**
96+
97+
**1. Install pre-commit** (if not already installed):
98+
99+
```bash
100+
uv pip install pre-commit
101+
```
102+
103+
**2. Set up the git hook scripts:**
104+
105+
```bash
106+
pre-commit install
107+
```
108+
109+
**3. Run the checks manually** (optional but recommended before your first commit):
110+
111+
```bash
112+
pre-commit run --all-files
113+
```
114+
115+
**Configured Hooks**
116+
117+
The pre-commit configuration runs `ruff`, a code formatter for consistent style
118+
119+
#### Type Checking
120+
121+
We use pyright for static type checking. Please ensure your Pull Requests are type-checked.
122+
123+
To run type checking manually:
124+
125+
```bash
126+
pyright
127+
```
128+
129+
## Contributing Guidelines
130+
131+
We welcome contributions! Here's how you can help:
132+
133+
### Reporting Issues
134+
135+
- **Bug Reports:** Open an issue with a clear description, steps to reproduce, and expected vs. actual behavior
136+
- **Feature Requests:** Open an issue describing the feature, its use case, and potential implementation approach
137+
- Check the [Issues](https://github.com/swiss-ai/mmore/issues) page for ongoing work
138+
139+
### Code Contributions
140+
141+
1. **Fork the repository** and create a new branch for your feature/fix
142+
2. **Write clear, documented code** following the existing style
143+
3. **Add tests** if applicable
144+
4. **Ensure all pre-commit hooks pass**
145+
5. **Run type checking** with `pyright`
146+
6. **Submit a Pull Request** with a clear description
147+
148+
## Project Structure
149+
150+
mmore/
151+
├── mmore/
152+
│ ├── process/ # Document processing pipeline
153+
│ │ ├── processors/ # Individual file type processors
154+
│ │ └── ...
155+
│ ├── postprocess/ # Post-processing utilities
156+
│ ├── index/ # Indexing and vector DB
157+
│ ├── rag/ # RAG implementation
158+
│ └── type/ # Type definitions and data models
159+
├── docs/ # Documentation
160+
├── examples/ # Example configurations and data
161+
├── tests/ # Test suite
162+
├── .pre-commit-config.yaml
163+
├── pyproject.toml
164+
└── README.md
165+
166+
Key Modules
167+
- **`mmore.process`**: Handles extraction from various file formats
168+
- **`mmore.index`**: Manages hybrid dense+sparse indexing with Milvus
169+
- **`mmore.rag`**: RAG system with LangChain integration
170+
- **`mmore.type`**: Core data structures like `MultimodalSample`
171+
172+
## Testing
173+
174+
### Running tests in the terminal
175+
176+
```bash
177+
pytest tests/
178+
```
179+
180+
### Writing tests
181+
182+
- Place tests in the `tests/` directory
183+
- Use descriptive test names
184+
- Cover edge cases and error conditions
185+
- Mock external dependencies when appropriate
186+
187+
## Pull Request Process
188+
189+
1. **Update documentation** if you're adding new features
190+
2. **Add examples** for new functionality
191+
3. **Ensure all tests pass** and pre-commit hooks succeed
192+
4. **Update the changelog** if applicable
193+
5. **Request review** from maintainers
194+
195+
### PR Checklist
196+
197+
- [] Code follows project style guidelines
198+
- [] Pre-commit hooks pass (`pre-commit run --all-files`)
199+
- [] Type checking passes (`pyright`)
200+
- [] Tests added/updated as needed
201+
- [] Documentation updated
202+
- [] Examples provided for new features
203+
- [] Commit messages are clear and descriptive
204+
205+
## Development Tips
206+
207+
### Working with UV
208+
209+
- Use `uv pip` instead of `pip` for all package installations
210+
- The project uses dependency overrides that are handled automatically by `uv`
211+
- See the UV tutorial for more details
212+
213+
## Questions?
214+
215+
If you have questions about contributing, feel free to:
216+
217+
- Open a discussion on GitHub
218+
- Reach out to the maintainers
219+
- Check existing issues for similar questions
220+
221+
Thank you for contributing to MMORE! 🎉

examples/postprocessor/config.yaml

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,8 @@
11
pp_modules:
2-
- type: file_namer
32
- type: chunker
43
args:
54
chunking_strategy: sentence
6-
- type: translator
7-
args:
8-
target_language: en
9-
attachment_tag: <attachment>
10-
confidence_threshold: 0.7
11-
constrained_languages:
12-
- fr
13-
- en
14-
- type: metafuse
15-
args:
16-
metadata_keys:
17-
- file_name
18-
content_template: Content from {file_name}
19-
position: beginning
205

216
output:
22-
output_path: examples/postprocessor/outputs/merged/
7+
output_path: examples/postprocessor/outputs/merged/results.jsonl
238
save_each_step: True

0 commit comments

Comments
 (0)