CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

RamaLama is a CLI tool for managing and serving AI models using containers. It provides a container-centric approach to AI model management, supporting multiple model registries (Hugging Face, Ollama, OCI registries) and automatic GPU detection with appropriate container image selection.

Build and Development Commands

Setup

make install-requirements    # Install dev dependencies via pip

Testing

# Unit tests (pytest via tox)
make unit-tests              # Run unit tests
make unit-tests-verbose      # Run with full trace output
tox                          # Direct tox invocation

# E2E tests (pytest)
make e2e-tests               # Run with Podman (default)
make e2e-tests-docker        # Run with Docker
make e2e-tests-nocontainer   # Run without container engine

# All tests
make tests                   # Run unit tests and e2e tests, with and without a container engine

Running a single test

# Unit test
tox -- test/unit/test_cli.py::test_function_name -vvv

# E2E test
tox -e e2e -- test/e2e/test_basic.py::test_function_name -vvv

Code Quality

make validate                # Run all validation (codespell, lint, format check, man-check, type check)
make lint                    # Run ruff + shellcheck
make check-format            # Check ruff formatting + import sorting
make format                  # Auto-format with ruff + import sorting
make type-check              # Run mypy type checking
make codespell               # Check spelling

Documentation

make docs                    # Build manpages and docsite

Architecture

Source Structure (`ramalama/`)

cli.py - Main CLI entry point, argparse setup, subcommand dispatch
config.py - Configuration constants and loading
common.py - Shared utility functions, GPU detection (get_accel())
engine.py - Container engine abstraction (Podman/Docker)

Transport System (`ramalama/transports/`)

Handles pulling/pushing models from different registries:

base.py - Base Transport class defining the interface
transport_factory.py - New() and TransportFactory for creating transports
huggingface.py, ollama.py, oci.py, modelscope.py, rlcr.py - Registry-specific implementations
Transports are selected via URL scheme prefixes: huggingface://, ollama://, oci://, etc.

Model Store (`ramalama/model_store/`)

Manages local model storage:

global_store.py - GlobalModelStore for model management
store.py - Low-level storage operations
reffile.py - Reference file handling for tracking model origins

Runtime Plugin System (`ramalama/plugins/`)

Each inference engine is a self-contained Python plugin:

interface.py - Abstract base classes: RuntimePlugin and InferenceRuntimePlugin
loader.py - Plugin discovery and get_all_runtimes() / get_runtime(); entry point group ramalama.runtimes.v1alpha
registry.py - Plugin registration
runtimes/inference/common.py - Concrete base classes:
- BaseInferenceRuntime — registers run and serve, implements handle_subcommand() dispatch
- ContainerizedInferenceRuntimePlugin — extends the above with container-specific args (--api, --generate)
runtimes/inference/llama_cpp.py - LlamaCppPlugin(LlamaCppCommands, ContainerizedInferenceRuntimePlugin) (default); owns all RAG subcommand logic
runtimes/inference/llama_cpp_commands.py - LlamaCppCommands mixin providing _cmd_run, _cmd_serve, and other llama.cpp command builders
runtimes/inference/vllm.py - VllmPlugin(ContainerizedInferenceRuntimePlugin)
runtimes/inference/mlx.py - MlxPlugin(BaseInferenceRuntime) (macOS only; always --nocontainer)

configure_subcommands() in cli.py calls register_subcommands() on only the selected runtime plugin, so --help output is filtered to the active runtime's supported subcommands.

Key Patterns

GPU Detection: get_accel() in common.py detects GPU type (CUDA, ROCm, Vulkan, etc.) and selects appropriate container image
Container Images: GPU-specific images at quay.io/ramalama/{ramalama,cuda,rocm,intel-gpu,...}
Inference Engines: llama.cpp (default), vllm, mlx (macOS only) - each implemented as a runtime plugin under ramalama/plugins/runtimes/inference/

Test Structure

test/unit/ - pytest unit tests (fast, no external dependencies)
test/e2e/ - pytest end-to-end tests (marked with @pytest.mark.e2e)

Code Style

Python 3.10+ required
Line length: 120 characters
Formatting: ruff format + ruff check (I rules)
Type hints encouraged (mypy checked)
Commits require DCO sign-off (git commit -s)
Commit messages for PRs:
- Use exactly one sign-off, added via git commit -s only.
- Do not manually add a "Signed-off-by:" line in the message body.
- Do not add any trailers (e.g. "Made-with: Cursor") after the sign-off.
- The author's Signed-off-by must be the last line of the commit message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Build and Development Commands

Setup

Testing

Running a single test

Code Quality

Documentation

Architecture

Source Structure (`ramalama/`)

Transport System (`ramalama/transports/`)

Model Store (`ramalama/model_store/`)

Runtime Plugin System (`ramalama/plugins/`)

Key Patterns

Test Structure

Code Style

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Build and Development Commands

Setup

Testing

Running a single test

Code Quality

Documentation

Architecture

Source Structure (ramalama/)

Transport System (ramalama/transports/)

Model Store (ramalama/model_store/)

Runtime Plugin System (ramalama/plugins/)

Key Patterns

Test Structure

Code Style

Source Structure (`ramalama/`)

Transport System (`ramalama/transports/`)

Model Store (`ramalama/model_store/`)

Runtime Plugin System (`ramalama/plugins/`)