Skip to content

Latest commit

 

History

History
336 lines (251 loc) · 11.1 KB

File metadata and controls

336 lines (251 loc) · 11.1 KB

Contributing to Ragtime

Development Setup

1. Install the toolchain

Install proto (version manager):

curl -fsSL https://moonrepo.dev/install/proto.sh | bash

Then restart your terminal (or run the source command shown by the installer).

Install required tools via proto:

proto install moon uv

The installer includes just (task runner) automatically. No additional setup needed!

System Dependencies

For Reflex frontend support, you need unzip. On Ubuntu/Debian:

sudo apt-get install unzip

On macOS:

brew install unzip

2. Clone and setup

git clone https://github.com/etalab-ia/ragtime.git
cd ragtime
just sync  # Installs dependencies and pre-commit hooks

Note: just sync runs uv sync to install dependencies and uv run pre-commit install to set up automatic code quality checks on commit.

Code Quality

Using just

With just installed:

just format       # Format code
just format-check # Check formatting
just lint         # Run linter
just lint-fix     # Run linter with auto-fix
just type-check   # Run type checker
just check        # Run all checks

Using moon run Directly

moon run tools:format       # Format code
moon run tools:format-check # Check formatting
moon run tools:lint         # Run linter
moon run tools:lint-fix     # Run linter with auto-fix
moon run tools:type-check   # Run type checker

Project Structure

ragtime/
├── apps/                    # Applications
│   ├── cli/                 # ragtime CLI tool
│   ├── chainlit-chat/       # Chainlit frontend (golden master)
│   └── reflex-chat/         # Reflex frontend (golden master)
├── packages/                # Shared packages
│   ├── rag-core/            # Core config + schema (ragtime.core)
│   ├── albert-client/       # Albert API SDK (uses `albert` namespace, not ragtime.*)
│   ├── ingestion/           # Document parsing (ragtime.ingestion)
│   ├── pipelines/           # Pipeline orchestration (ragtime.pipelines)
│   ├── retrieval/           # Vector search (ragtime.retrieval)
│   ├── reranking/           # Cross-encoder re-scoring (ragtime.reranking)
│   ├── context/             # Context formatting (ragtime.context)
│   ├── storage/             # Collection management (ragtime.storage)
│   └── ragtime-lib/      # Library bundle (all pipeline packages)
├── docs/                    # User and contributor documentation
│   ├── guides/              # Getting started, setup, pipelines
│   ├── reference/           # Components, config, ragtime.toml
│   └── troubleshooting/     # Common issues and fixes
├── tools/                   # Development tools
│   ├── generate_templates.py # Template generator
│   └── moon.yml             # Code quality tasks
├── .moon/                   # Moon workspace config
│   ├── templates/           # Moon templates (source of truth)
│   └── toolchain.yml        # Python/uv/proto config
├── justfile                 # Developer task runner
├── CONTRIBUTING.md          # Contributing guide (you are here)
└── pyproject.toml           # Workspace config

Templates

Templates live in .moon/templates/ and are automatically bundled into the CLI package at build time via hatch's force-include.

Running Tests

moon run cli:test

Understanding the Pipeline Architecture

Ragtime uses a phase-based pipeline under the ragtime.* namespace. Each RAG pipeline phase is its own package, and ragtime.pipelines orchestrates them all.

Package → namespace mapping:

Package Import namespace Responsibility
packages/rag-core/ ragtime.core Config schema, presets, RAGConfig
packages/ingestion/ ragtime.ingestion Document parsing (local pypdf or Albert API)
packages/storage/ ragtime.storage Collection management (Albert API)
packages/retrieval/ ragtime.retrieval Vector search
packages/reranking/ ragtime.reranking Cross-encoder re-scoring
packages/context/ ragtime.context Format retrieved chunks → LLM context
packages/pipelines/ ragtime.pipelines Orchestrates all phases
packages/ragtime-lib/ Bundles all the above for external projects
packages/albert-client/ albert Low-level Albert API SDK — not under ragtime.*

Note: albert-client is versioned independently (tracks the Albert API OpenAPI spec) and is intentionally kept outside the ragtime.* namespace so it can be used standalone.

Pipeline selection is driven by storage.provider in ragtime.toml:

[storage]
provider = "albert-collections"  # Full Albert RAG (default)
# provider = "local-sqlite"      # Local text extraction (offline)

Chat apps import from the ragtime.pipelines namespace:

from ragtime.pipelines import get_pipeline

pipeline = get_pipeline(config)
await pipeline.process_file(file_bytes, mime_type)
response = await pipeline.process_query(message_history)

There is no context_loader.py or modules.yml — the pipeline factory handles backend selection automatically based on config.

Key files:

  • packages/pipelines/src/ragtime/pipelines/ — Pipeline orchestration
    • __init__.pyget_pipeline(config) factory
    • albert.pyAlbertPipeline: ingestion → storage → retrieval → reranking → context
    • basic.pyBasicPipeline: local text extraction only
  • packages/rag-core/src/ragtime/core/schema.py — All config Pydantic models

When modifying pipeline logic:

  • Changes to orchestration belong in packages/pipelines/
  • Each phase package is self-contained (no inter-phase imports)
  • Only packages/pipelines/ depends on all phase packages

Testing the Generate Dataset Command

The generate-dataset command supports pluggable providers. To test locally:

# Test with Letta provider
export LETTA_API_KEY="test-key"
export DATA_FOUNDRY_AGENT_ID="test-agent"
python -m pytest apps/cli/tests/test_generate_dataset.py

# Test with Albert provider
export OPENAI_API_KEY="test-key"
export OPENAI_BASE_URL="http://localhost:8000"
export OPENAI_MODEL="mistral-7b"
python -m pytest apps/cli/tests/test_generate_dataset.py

Adding a New Data Foundry Provider

To add support for a new provider (e.g., another LLM service):

  1. Create provider class in apps/cli/src/cli/commands/eval/providers/{name}.py:

    from collections.abc import Iterator
    from .schema import GeneratedSample
    
    class MyProvider:
        def __init__(self, **kwargs):
            # Initialize with credentials
            pass
    
        def upload_documents(self, document_paths: list[str]) -> None:
            # Upload docs to your service
            pass
    
        def generate(self, num_samples: int) -> Iterator[GeneratedSample]:
            # Generate samples
            yield sample
    
        def cleanup(self) -> None:
            # Clean up resources
            pass
  2. Update factory in apps/cli/src/cli/commands/eval/providers/__init__.py:

    elif provider_name == "myservice":
        from .myservice import MyProvider
        return MyProvider(**kwargs)
  3. Add CLI option in apps/cli/src/cli/commands/generate_dataset.py:

    • Add validation for provider-specific env vars
    • Route to factory with correct credentials
  4. Add tests in apps/cli/tests/test_generate_dataset.py

All providers must:

  • Implement the DataFoundryProvider protocol
  • Output Ragas-compatible JSON (user_input, retrieved_contexts, reference, _metadata)
  • Support French content
  • Use DocumentPreprocessor for PDF extraction (prevents upload timeouts)
  • Add logging via logger = logging.getLogger(__name__) for debugging
  • Enforce strict JSONL-only output (no preamble or extraneous text)

Testing install.sh from a branch

To test the install script in a clean Docker environment:

# Start a fresh Ubuntu container
docker run -it ubuntu:24.04

# Install curl (required to download the script)
apt-get update && apt-get install -y curl

# Run the installer from a branch
export RAG_FACILE_BRANCH=my-feature-branch
curl -fsSL https://raw.githubusercontent.com/etalab-ia/ragtime/$RAG_FACILE_BRANCH/install.sh | bash
source ~/.bashrc

# Test workspace setup
ragtime setup my-rag-app

The install script will automatically install other prerequisites (git, xz-utils) on Debian/Ubuntu.

Commit Messages

We use Conventional Commits for consistent versioning and changelogs. This enables automated release management via release-please.

Format

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Types

Use these prefixes to indicate how commits should affect versioning:

  • feat: - New feature (minor version bump)
  • fix: - Bug fix (patch version bump)
  • feat!: or fix!: - Breaking change (major version bump)
  • docs: - Documentation changes (no release)
  • chore: - Internal changes (no release)
  • refactor: - Code reorganization (no release unless with !)

Examples

Single feature:

git commit -m "feat: add support for PDF annotations"

Bug fix:

git commit -m "fix: resolve memory leak in pdf-context"

Multiple changes in one commit:

git commit -m "feat: add PDF viewer

This adds comprehensive PDF viewing capabilities
to reflex-chat.

fix: resolve rendering bug in pdf-context
Fixes #245"

Breaking change:

git commit -m "feat!: redesign CLI command interface

BREAKING CHANGE: the old 'rf init' command is now 'rf generate init'"

Automated Releases

When commits with conventional prefixes (feat:, fix:, etc.) are merged to main, release-please automatically:

  1. Creates a Release PR with:

    • Version bumps for affected packages
    • Generated changelog entries from commits
    • Updated CHANGELOG.md files
  2. When the Release PR is merged:

    • Versions are locked in pyproject.toml
    • GitHub Release is created with release notes
    • Commit is tagged with version number

No manual steps needed! Just use conventional commits and release-please handles the rest.

CI Checks

The CI pipeline runs:

  • ruff format --check - Code formatting
  • ruff check - Linting
  • ty check - Type checking
  • moon run cli:test - Tests

All checks must pass before merging.

Configuration-Driven Architecture

Ragtime uses a configuration-driven architecture. Most components (apps, packages) do not have hardcoded RAG parameters. Instead, they consume the RAGConfig Pydantic model from packages/rag-core.

When adding new features that require configuration:

  1. Define the schema in packages/rag-core/src/ragtime/core/schema.py.
  2. Update presets in packages/rag-core/presets/ if applicable.
  3. Access the configuration in your code using from ragtime.core import get_config.