|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Repository purpose |
| 6 | + |
| 7 | +This repo ships **one artifact**: `dk-installer.py` — a single-file, stdlib-only Python script that end users download and run to install/upgrade/demo the open-source DataKitchen products (TestGen and DataOps Observability) locally via Docker Compose. On Windows it is also packaged as `dk-installer.exe` via PyInstaller (see `.github/workflows/release_exe.yml`). |
| 8 | + |
| 9 | +The `demo/` directory is a separate deliverable: it is built into the `datakitchen/data-observability-demo` Docker image that `dk-installer.py` pulls at runtime to generate demo data. It is **not** imported by the installer. |
| 10 | + |
| 11 | +## Common commands |
| 12 | + |
| 13 | +```bash |
| 14 | +pip install .[dev,test] # install ruff + pytest (project has no runtime deps) |
| 15 | + |
| 16 | +ruff check --show-fixes # lint (CI-enforced) |
| 17 | +ruff format --check --diff # format check (CI-enforced) |
| 18 | +ruff format # apply formatting |
| 19 | + |
| 20 | +pytest # run full test suite |
| 21 | +pytest tests/test_tg_install.py # single file |
| 22 | +pytest tests/test_action.py::test_name # single test |
| 23 | +pytest -m unit # only unit-marked tests |
| 24 | +pytest -m integration # only integration-marked tests |
| 25 | +pytest --cov --cov-report=term-missing # with coverage (matches CI) |
| 26 | + |
| 27 | +python3 dk-installer.py --help # see installer CLI |
| 28 | +python3 dk-installer.py tg install # run an action locally during dev |
| 29 | +``` |
| 30 | + |
| 31 | +Building the Windows `.exe` happens automatically on every push to `main` (`release_exe.yml` → PyInstaller → GitHub Release tagged `latest`). For local builds on Windows, see `docs/build_windows_installer.md`. |
| 32 | + |
| 33 | +## Architecture: how `dk-installer.py` is organized |
| 34 | + |
| 35 | +The installer is a ~2300-line single file intentionally using only the Python stdlib — users run it without installing any packages. Do not introduce third-party runtime dependencies. |
| 36 | + |
| 37 | +Core abstractions (all in `dk-installer.py`): |
| 38 | + |
| 39 | +- **`Installer`** — top-level argparse wrapper. `get_installer_instance()` at the bottom of the file registers the two products (`obs`, `tg`) and their actions. Each product sets compose-file defaults (`compose_file_name`, `compose_project_name`) that flow into actions via argparse `set_defaults`. |
| 40 | +- **`Action`** — base class for one CLI subcommand (e.g., `tg install`). Owns session-scoped concerns: creates a timestamped log folder under `.dk-installer/` (or `%LOCALAPPDATA%/DataKitchenApps/` on Windows), configures logging, zips logs on exit, wraps execution in `AnalyticsWrapper`, enforces `requirements` (list of `Requirement` objects that shell out to check `docker`, `docker compose`, etc.), and provides `run_cmd` / `run_cmd_retries` — always use these rather than raw `subprocess` so output is captured per-command into the session zip. |
| 41 | +- **`MultiStepAction`** — `Action` subclass that declares a `steps: list[type[Step]]`. Each `Step` has `pre_execute` (run for all steps before any executes — validation phase) then `execute` (the actual work). On any step failure, remaining steps are skipped and `on_action_fail` runs in reverse order; on success, `on_action_success` runs in reverse order. **Most install/upgrade actions are `MultiStepAction`s** — when adding a new install phase, write a new `Step` class and add it to the list. |
| 42 | +- **`Step`** — unit of work inside a `MultiStepAction`. Steps share state via `action.ctx` (a dict on the parent action). Raising `SkipStep` from `execute` marks it SKIPPED; raising any other exception marks it FAILED and aborts the action if `required = True`. |
| 43 | +- **`ComposeActionMixin` / `ComposeDeleteAction` / `ComposePullImagesStep` / `ComposeStartStep` / `CreateComposeFileStepBase`** — shared building blocks for both products. `Obs*` and `TestGen*` classes specialize these. |
| 44 | +- **`AnalyticsWrapper`** — sends anonymous Mixpanel events for each action (disabled with `--no-analytics` or `DK_INSTALLER_ANALYTICS=no`). Instance ID is persisted to `.dk-installer/instance.txt`. Don't log PII here. |
| 45 | +- **`Console`** (global `CONSOLE`) — all user-facing output goes through this; don't use bare `print` for user messages (the menu code and `collect_user_input` are the exceptions). |
| 46 | +- **`Menu`** / `show_menu` — only used when the frozen Windows `.exe` is launched with no arguments (double-click). Not part of the CLI flow on Unix. |
| 47 | + |
| 48 | +The action registry in `get_installer_instance()` is the authoritative list of user-facing commands — to add a new command, add an `Action` subclass there. |
| 49 | + |
| 50 | +### Data locations at runtime |
| 51 | + |
| 52 | +- Unix: installer writes the compose file, credentials file, and `demo-config.json` next to `dk-installer.py`; logs go to `./.dk-installer/<action>-<timestamp>.zip`. |
| 53 | +- Windows: data and logs go to `%LOCALAPPDATA%/DataKitchenApps/`. |
| 54 | + |
| 55 | +### Demo container |
| 56 | + |
| 57 | +The `demo/` tree is built into a separate image (`datakitchen/data-observability-demo:latest`) via `demo/deploy/build-image`. `DemoContainerAction` in `dk-installer.py` pulls this image and mounts `demo-config.json` into it. Changes to `demo/*.py` don't affect the installer until that image is rebuilt and pushed. |
| 58 | + |
| 59 | +### Bumping uv |
| 60 | + |
| 61 | +The pip install path bootstraps a known version of `uv` from the astral-sh GitHub release. Two top-level constants govern this: |
| 62 | + |
| 63 | +- `UV_VERSION` — the pinned version (e.g., `"0.11.7"`). |
| 64 | +- `UV_ASSETS` — a `(platform.system(), platform.machine()) → (asset_name, sha256)` map. Six entries: Linux x86_64/aarch64, Darwin x86_64/arm64, Windows AMD64/ARM64. |
| 65 | + |
| 66 | +To bump: |
| 67 | + |
| 68 | +1. Update `UV_VERSION`. |
| 69 | +2. Pull the matching `dist-manifest.json` from `https://github.com/astral-sh/uv/releases/download/<version>/dist-manifest.json` and refresh the SHA256 for each of the 6 assets in `UV_ASSETS`. Each release also publishes a `<asset>.sha256` file you can `curl` directly if you'd rather pin one at a time. |
| 70 | +3. Sanity-check: `pytest tests/test_uv_bootstrap.py`. The bootstrap step exercises hash verification and the asset-not-supported path. |
| 71 | + |
| 72 | +Do not skip the hash refresh — TLS verification is intentionally relaxed for the GitHub download (corp-proxy support), and the SHA256 pin is the security guarantee. |
| 73 | + |
| 74 | +## Testing conventions |
| 75 | + |
| 76 | +- `tests/installer.py` is a **symlink to `../dk-installer.py`** — tests import installer internals as `from tests.installer import ...`. Don't replace this with a copy. |
| 77 | +- Heavy use of `unittest.mock.patch` to stub `subprocess` / `start_cmd` / `run_cmd`. The key fixtures live in `tests/conftest.py` — `action_cls` patches class-level attributes on `Action` so tests can instantiate actions without a real session folder, and `args_mock` provides a fully-populated `argparse.Namespace`. |
| 78 | +- Tests are marked `@pytest.mark.unit` or `@pytest.mark.integration`. CI runs everything; use the markers locally to scope a run. |
| 79 | + |
| 80 | +## Style |
| 81 | + |
| 82 | +- Line length 120, double quotes, ruff-enforced (`pyproject.toml` restricts ruff's `include` to `dk-installer.py` only — the `demo/` and `tests/` trees are deliberately not linted by this project's ruff config). |
| 83 | +- Pre-commit hooks run ruff on commit (`.pre-commit-config.yaml`). Install once with `pre-commit install`. |
| 84 | +- Target Python is 3.9 (CI uses 3.9); avoid 3.10+ syntax like `match` statements or `X | Y` type unions in new code — the file uses `typing.Union` / `typing.Optional` deliberately for this reason. |
| 85 | + |
| 86 | +## CI |
| 87 | + |
| 88 | +`.github/workflows/pull_request.yml` runs ruff + pytest (with coverage comment) on every PR against `main`. `release_exe.yml` publishes the Windows `.exe` on every push to `main` by force-moving the `latest` tag and recreating the release — keep this in mind before merging, since each merge replaces the public download. |
0 commit comments