Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions .github/workflows/ruff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,21 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: astral-sh/ruff-action@v3
- run: ruff check
- run: ruff format --check
- name: Install dependencies
run: |
pip install pre-commit ruff

# 4️⃣ Run pre-commit on all files
- name: Run pre-commit hooks
run: |
pre-commit run --all-files || (
echo ""
echo "❌ Pre-commit checks failed!"
echo "This project REQUIRES pre-commit to run formatting and lint fixes."
echo ""
echo "To fix locally, run:"
echo " pip install pre-commit"
echo " pre-commit install"
echo " pre-commit run --all-files"
exit 1
)
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,6 @@ test*.sh

# Examples
examples/outputs
outputs/
outputs/

paper/
18 changes: 3 additions & 15 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,8 @@
repos:
- repo: https://github.com/PyCQA/isort
rev: 6.0.1
hooks:
- id: isort
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.12.11
hooks:
- id: ruff-check
args: [
--fix, # auto-fix lint + style issues
--unsafe-fixes, # allows formatting & import sorting
]
- id: ruff
args: ["--fix"]

- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell # See pyproject.toml for args
additional_dependencies:
- tomli
- id: ruff-format
38 changes: 3 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ If `weasyprint` fails to find GTK or Cairo, also run:

```bash
brew install cairo pango gdk-pixbuf libffi
pip install weasyprint
uv pip install weasyprint
```

#### Step 1 – Install MMORE
Expand All @@ -61,15 +61,10 @@ To install the latest release of the package, simply run:
uv pip install mmore
```

To install the package for development, simply run:
```bash
uv venv .venv
source .venv/bin/activate
uv pip install -e .
```

> :warning: This package requires many big dependencies and requires a dependency override, so it has to be installed with `uv` to handle `pip` installations. [Check our tutorial on uv](https://github.com/swiss-ai/mmore/blob/master/docs/uv.md).

> :warning: **Check the instructions for contributors directly at [`docs/for_devs.md`](./docs/for_devs.md)**

### Minimal Example

You can use our predefined CLI commands to execute parts of the pipeline. Note that you might need to prepend `python -m` to the command if the package does not properly create bash aliases.
Expand Down Expand Up @@ -142,33 +137,6 @@ See [the `/docs` directory](https://github.com/swiss-ai/mmore/blob/master/docs)
| **Media Files** | MP4, MOV, AVI, MKV, MP3, WAV, AAC | GPU/CPU | :white_check_mark:
| **Web Content** | HTML | CPU | :x:


## Contributing

We welcome contributions to improve the current state of the pipeline, feel free to:

- Open an issue to report a bug or ask for a new feature
- Open a pull request to fix a bug or add a new feature
- You can find ongoing new features and bugs in the [Issues]

Don't hesitate to star the project :star: if you find it interesting! (you would be our star).

### To make sure your code is pretty, this repo has a `pre-commit` configuration file that runs linters (`isort`, `black`)

1. Install pre-commit if you haven't already

`uv pip install pre-commit`

2. Set up the git hook scripts

`pre-commit install`

3. Run the checks manually (optional but good before first commit)

`pre-commit run --all-files`

We also use `pyright` to type-check the code base, please make sure your Pull Requests are type-checked.

## License

This project is licensed under the Apache 2.0 License, see the [LICENSE :mortar_board:](LICENSE) file for details.
Expand Down
221 changes: 221 additions & 0 deletions docs/for_devs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
# Developer Documentation

Welcome to the MMORE developer documentation! This guide will help you set up your development environment and contribute to the project.

## Table of Contents

- [Developer Documentation](#developer-documentation)
- [Table of Contents](#table-of-contents)
- [Development Setup](#development-setup)
- [System Dependencies](#system-dependencies)
- [Linux (Ubuntu/Debian)](#linux-ubuntudebian)
- [MacOS](#macos)
- [Installing MMORE for Development](#installing-mmore-for-development)
- [Code Quality-Tools](#code-quality-tools)
- [Pre-commit Hooks](#pre-commit-hooks)
- [Type Checking](#type-checking)
- [Contributing Guidelines](#contributing-guidelines)
- [Reporting Issues](#reporting-issues)
- [Code Contributions](#code-contributions)
- [Project Structure](#project-structure)
- [Testing](#testing)
- [Running tests in the terminal](#running-tests-in-the-terminal)
- [Writing tests](#writing-tests)
- [Pull Request Process](#pull-request-process)
- [PR Checklist](#pr-checklist)
- [Development Tips](#development-tips)
- [Working with UV](#working-with-uv)
- [Questions?](#questions)

---

## Development Setup

### System Dependencies

Before installing MMORE for development, ensure you have the required system dependencies installed.

#### Linux (Ubuntu/Debian)

```bash
sudo apt update
sudo apt install -y ffmpeg libsm6 libxext6 chromium-browser libnss3 \
libgconf-2-4 libxi6 libxrandr2 libxcomposite1 libxcursor1 libxdamage1 \
libxext6 libxfixes3 libxrender1 libasound2 libatk1.0-0 libgtk-3-0 libreoffice \
libpango-1.0-0 libpangoft2-1.0-0 weasyprint
```

> **Note:** Note: On Ubuntu 24.04, replace `libasound2` with `libasound2t64`. You may also need to add the repository for Ubuntu 20.04 focal to have access to a few of the sources (e.g., create `/etc/apt/sources.list.d/mmore.list` with the contents `deb http://cz.archive.ubuntu.com/ubuntu focal main universe`).

#### MacOS

```bash
brew update
brew install ffmpeg chromium gtk+3 pango cairo \
gobject-introspection libffi pkg-config libx11 libxi \
libxrandr libxcomposite libxcursor libxdamage libxext \
libxrender libasound2 atk libreoffice weasyprint
```

If `weasyprint` fails to find GTK or Cairo, also run:

```bash
brew install cairo pango gdk-pixbuf libffi
uv pip install weasyprint
```

### Installing MMORE for Development

**1. Clone the repository:**

```bash
git clone https://github.com/swiss-ai/mmore.git
cd mmore
```

**2. Create a virtual environment and install dependencies:**

```bash
uv venv .venv
source .venv/bin/activate
uv pip install -e .
uv pip install .[dev]
```

> **Important:** This package requires many big dependencies and requires a dependency override, so it must be installed with `uv` to handle `pip` installations. Check our [tutorial on uv](./uv.md) for more information.

### Code Quality-Tools

MMORE uses several tools to maintain code quality and consistency.

#### Pre-commit Hooks

We use `pre-commit` to automatically run code formatters and linters before each commit.

**Setup**

**1. Install pre-commit** (if not already installed):

```bash
uv pip install pre-commit
```

**2. Set up the git hook scripts:**

```bash
pre-commit install
```

**3. Run the checks manually** (optional but recommended before your first commit):

```bash
pre-commit run --all-files
```

**Configured Hooks**

The pre-commit configuration runs `ruff`, a code formatter for consistent style

#### Type Checking

We use pyright for static type checking. Please ensure your Pull Requests are type-checked.

To run type checking manually:

```bash
pyright
```

## Contributing Guidelines

We welcome contributions! Here's how you can help:

### Reporting Issues

- **Bug Reports:** Open an issue with a clear description, steps to reproduce, and expected vs. actual behavior
- **Feature Requests:** Open an issue describing the feature, its use case, and potential implementation approach
- Check the [Issues](https://github.com/swiss-ai/mmore/issues) page for ongoing work

### Code Contributions

1. **Fork the repository** and create a new branch for your feature/fix
2. **Write clear, documented code** following the existing style
3. **Add tests** if applicable
4. **Ensure all pre-commit hooks pass**
5. **Run type checking** with `pyright`
6. **Submit a Pull Request** with a clear description

## Project Structure

mmore/
├── mmore/
│ ├── process/ # Document processing pipeline
│ │ ├── processors/ # Individual file type processors
│ │ └── ...
│ ├── postprocess/ # Post-processing utilities
│ ├── index/ # Indexing and vector DB
│ ├── rag/ # RAG implementation
│ └── type/ # Type definitions and data models
├── docs/ # Documentation
├── examples/ # Example configurations and data
├── tests/ # Test suite
├── .pre-commit-config.yaml
├── pyproject.toml
└── README.md

Key Modules
- **`mmore.process`**: Handles extraction from various file formats
- **`mmore.index`**: Manages hybrid dense+sparse indexing with Milvus
- **`mmore.rag`**: RAG system with LangChain integration
- **`mmore.type`**: Core data structures like `MultimodalSample`

## Testing

### Running tests in the terminal

```bash
pytest tests/
```

### Writing tests

- Place tests in the `tests/` directory
- Use descriptive test names
- Cover edge cases and error conditions
- Mock external dependencies when appropriate

## Pull Request Process

1. **Update documentation** if you're adding new features
2. **Add examples** for new functionality
3. **Ensure all tests pass** and pre-commit hooks succeed
4. **Update the changelog** if applicable
5. **Request review** from maintainers

### PR Checklist

- [] Code follows project style guidelines
- [] Pre-commit hooks pass (`pre-commit run --all-files`)
- [] Type checking passes (`pyright`)
- [] Tests added/updated as needed
- [] Documentation updated
- [] Examples provided for new features
- [] Commit messages are clear and descriptive

## Development Tips

### Working with UV

- Use `uv pip` instead of `pip` for all package installations
- The project uses dependency overrides that are handled automatically by `uv`
- See the UV tutorial for more details

## Questions?

If you have questions about contributing, feel free to:

- Open a discussion on GitHub
- Reach out to the maintainers
- Check existing issues for similar questions

Thank you for contributing to MMORE! 🎉
17 changes: 1 addition & 16 deletions examples/postprocessor/config.yaml
Original file line number Diff line number Diff line change
@@ -1,23 +1,8 @@
pp_modules:
- type: file_namer
- type: chunker
args:
chunking_strategy: sentence
- type: translator
args:
target_language: en
attachment_tag: <attachment>
confidence_threshold: 0.7
constrained_languages:
- fr
- en
- type: metafuse
args:
metadata_keys:
- file_name
content_template: Content from {file_name}
position: beginning

output:
output_path: examples/postprocessor/outputs/merged/
output_path: examples/postprocessor/outputs/merged/results.jsonl
save_each_step: True
Loading
Loading