This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
RamaLama is a CLI tool for managing and serving AI models using containers. It provides a container-centric approach to AI model management, supporting multiple model registries (Hugging Face, Ollama, OCI registries) and automatic GPU detection with appropriate container image selection.
make install-requirements # Install dev dependencies via pip# Unit tests (pytest via tox)
make unit-tests # Run unit tests
make unit-tests-verbose # Run with full trace output
tox # Direct tox invocation
# E2E tests (pytest)
make e2e-tests # Run with Podman (default)
make e2e-tests-docker # Run with Docker
make e2e-tests-nocontainer # Run without container engine
# All tests
make tests # Run unit tests and e2e tests, with and without a container engine# Unit test
tox -- test/unit/test_cli.py::test_function_name -vvv
# E2E test
tox -e e2e -- test/e2e/test_basic.py::test_function_name -vvvmake validate # Run all validation (codespell, lint, format check, man-check, type check)
make lint # Run ruff + shellcheck
make check-format # Check ruff formatting + import sorting
make format # Auto-format with ruff + import sorting
make type-check # Run mypy type checking
make codespell # Check spellingmake docs # Build manpages and docsitecli.py- Main CLI entry point, argparse setup, subcommand dispatchconfig.py- Configuration constants and loadingcommon.py- Shared utility functions, GPU detection (get_accel())engine.py- Container engine abstraction (Podman/Docker)
Handles pulling/pushing models from different registries:
base.py- BaseTransportclass defining the interfacetransport_factory.py-New()andTransportFactoryfor creating transportshuggingface.py,ollama.py,oci.py,modelscope.py,rlcr.py- Registry-specific implementations- Transports are selected via URL scheme prefixes:
huggingface://,ollama://,oci://, etc.
Manages local model storage:
global_store.py-GlobalModelStorefor model managementstore.py- Low-level storage operationsreffile.py- Reference file handling for tracking model origins
Each inference engine is a self-contained Python plugin:
interface.py- Abstract base classes:RuntimePluginandInferenceRuntimePluginloader.py- Plugin discovery andget_all_runtimes()/get_runtime(); entry point groupramalama.runtimes.v1alpharegistry.py- Plugin registrationruntimes/inference/common.py- Concrete base classes:BaseInferenceRuntime— registersrunandserve, implementshandle_subcommand()dispatchContainerizedInferenceRuntimePlugin— extends the above with container-specific args (--api,--generate)
runtimes/inference/llama_cpp.py-LlamaCppPlugin(LlamaCppCommands, ContainerizedInferenceRuntimePlugin)(default); owns all RAG subcommand logicruntimes/inference/llama_cpp_commands.py-LlamaCppCommandsmixin providing_cmd_run,_cmd_serve, and other llama.cpp command buildersruntimes/inference/vllm.py-VllmPlugin(ContainerizedInferenceRuntimePlugin)runtimes/inference/mlx.py-MlxPlugin(BaseInferenceRuntime)(macOS only; always --nocontainer)
configure_subcommands() in cli.py calls register_subcommands() on only the
selected runtime plugin, so --help output is filtered to the active runtime's
supported subcommands.
- GPU Detection:
get_accel()incommon.pydetects GPU type (CUDA, ROCm, Vulkan, etc.) and selects appropriate container image - Container Images: GPU-specific images at
quay.io/ramalama/{ramalama,cuda,rocm,intel-gpu,...} - Inference Engines: llama.cpp (default), vllm, mlx (macOS only) - each implemented as a runtime plugin under
ramalama/plugins/runtimes/inference/
test/unit/- pytest unit tests (fast, no external dependencies)test/e2e/- pytest end-to-end tests (marked with@pytest.mark.e2e)
- Python 3.10+ required
- Line length: 120 characters
- Formatting: ruff format + ruff check (I rules)
- Type hints encouraged (mypy checked)
- Commits require DCO sign-off (
git commit -s) - Commit messages for PRs:
- Use exactly one sign-off, added via
git commit -sonly. - Do not manually add a "Signed-off-by:" line in the message body.
- Do not add any trailers (e.g. "Made-with: Cursor") after the sign-off.
- The author's
Signed-off-bymust be the last line of the commit message.
- Use exactly one sign-off, added via