|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Build and Development Commands |
| 6 | + |
| 7 | +```bash |
| 8 | +# Build the project |
| 9 | +cargo build |
| 10 | + |
| 11 | +# Build in release mode |
| 12 | +cargo build --release |
| 13 | + |
| 14 | +# Run the server |
| 15 | +cargo run --bin modelexpress-server |
| 16 | + |
| 17 | +# Run tests |
| 18 | +cargo test |
| 19 | + |
| 20 | +# Run integration tests (starts server, runs test client) |
| 21 | +./run_integration_tests.sh |
| 22 | + |
| 23 | +# Run a specific test client |
| 24 | +cargo run --bin test_client -- --test-model "google-t5/t5-small" |
| 25 | + |
| 26 | +# Run clippy (required before submitting code) |
| 27 | +cargo clippy |
| 28 | + |
| 29 | +# Generate sample configuration file |
| 30 | +cargo run --bin config_gen -- --output model-express.yaml |
| 31 | +``` |
| 32 | + |
| 33 | +## Architecture |
| 34 | + |
| 35 | +ModelExpress is a Rust-based model cache management service that accelerates inference by caching HuggingFace models. It can be deployed standalone or as a sidecar alongside inference solutions like NVIDIA Dynamo. |
| 36 | + |
| 37 | +### Workspace Structure |
| 38 | + |
| 39 | +The project is a Rust workspace with three crates: |
| 40 | + |
| 41 | +- **`modelexpress_server`** (`modelexpress-server`): gRPC server providing model services |
| 42 | + - `services.rs`: Implements `HealthService`, `ApiService`, and `ModelService` gRPC services |
| 43 | + - `database.rs`: SQLite-based model status persistence via `ModelDatabase` |
| 44 | + - `cache.rs`: Cache eviction and management |
| 45 | + - Uses global `MODEL_TRACKER` (`LazyLock<ModelDownloadTracker>`) for tracking download state |
| 46 | + |
| 47 | +- **`modelexpress_client`** (`modelexpress-client`): Client library and CLI tool |
| 48 | + - `lib.rs`: Main `Client` struct with gRPC clients for health, API, and model services |
| 49 | + - `bin/cli.rs`: HuggingFace CLI replacement for model downloads |
| 50 | + - Supports automatic fallback to direct download when server unavailable |
| 51 | + |
| 52 | +- **`modelexpress_common`** (`modelexpress-common`): Shared code and protobuf definitions |
| 53 | + - `grpc/` module contains generated proto code (health, api, model) |
| 54 | + - `providers/huggingface.rs`: HuggingFace download implementation |
| 55 | + - `download.rs`: Provider-agnostic download orchestration |
| 56 | + - `cache.rs`, `config.rs`, `client_config.rs`: Configuration types |
| 57 | + |
| 58 | +### gRPC Services |
| 59 | + |
| 60 | +Protocol definitions are in `modelexpress_common/proto/`: |
| 61 | +- `health.proto`: Health check endpoint |
| 62 | +- `api.proto`: Generic request/response API |
| 63 | +- `model.proto`: Model download with streaming status updates |
| 64 | + |
| 65 | +### Key Patterns |
| 66 | + |
| 67 | +- Download status tracked in SQLite database with compare-and-swap for concurrent request handling |
| 68 | +- Streaming gRPC responses for download progress updates via `ModelStatusUpdate` |
| 69 | +- `CacheConfig::discover()` finds cache configuration from environment or config files |
| 70 | +- Configuration layering: CLI args > environment variables > config files > defaults |
| 71 | + |
| 72 | +### Adding CLI Arguments |
| 73 | + |
| 74 | +Client CLI arguments and environment variables are defined in a shared struct to avoid duplication: |
| 75 | + |
| 76 | +1. **`ClientArgs`** in `modelexpress_common/src/client_config.rs`: |
| 77 | + - Single source of truth for shared client arguments (endpoint, timeout, cache settings, etc.) |
| 78 | + - Add new arguments here with `#[arg(long, env = "MODEL_EXPRESS_...")]` |
| 79 | + - Avoid `-v` short flag (reserved for CLI's verbose) |
| 80 | + |
| 81 | +2. **`ClientConfig::load()`** in the same file: |
| 82 | + - Apply the new argument to the config struct in the "APPLY CLI ARGUMENT OVERRIDES" section |
| 83 | + |
| 84 | +3. **`Cli`** in `modelexpress_client/src/bin/modules/args.rs`: |
| 85 | + - Embeds `ClientArgs` via `#[command(flatten)]` |
| 86 | + - Only add CLI-specific arguments here (e.g., `--format`, `--verbose`) |
| 87 | + |
| 88 | +4. **Tests**: Add tests in `client_config.rs` for argument parsing and config loading |
| 89 | + |
| 90 | +## Code Standards |
| 91 | + |
| 92 | +- **No `unwrap()`**: Strictly forbidden except in benchmarks. Use `match`, `?`, or `expect()` (tests only) |
| 93 | +- **All dependencies in root `Cargo.toml`**: Sub-crates use workspace dependencies exclusively |
| 94 | +- **Clippy enforced**: `cargo clippy` must pass with no warnings (multiple lints set to deny) |
| 95 | +- **No emojis in code** |
| 96 | +- **No markdown documentation files for code changes** |
| 97 | + |
| 98 | +## AI Agent Instructions |
| 99 | + |
| 100 | +When introducing new patterns, conventions, or architectural decisions that affect how code should be written, update ALL AI agent instruction files: |
| 101 | +- `CLAUDE.md` (Claude Code) |
| 102 | +- `.github/copilot-instructions.md` (GitHub Copilot) |
| 103 | +- `.cursor/rules/rust.mdc` (Cursor) |
0 commit comments