This is a uv workspace monorepo with four components:
client/—dhub-clipackage (open-source CLI, published to PyPI) — import path:dhub.*server/—decision-hub-serverpackage (private backend, deployed on Modal) — import path:decision_hub.*shared/—dhub-corepackage (shared domain models and SKILL.md manifest parsing) — import path:dhub_core.*frontend/— React + TypeScript web UI (bundled into the server at deploy time, not a uv workspace member)
shared/ is the single source of truth for data models (SkillManifest, RuntimeConfig, etc.) and manifest parsing. Both client and server depend on it — never duplicate these definitions.
Backend (Python 3.11+):
- FastAPI for REST API (server)
- Typer + Rich for CLI (client)
- Pydantic for data validation and settings
- Gemini for LLM (search, classification, gauntlet safety analysis)
- Anthropic for LLM (eval judging)
- boto3 for S3 access
- loguru for server logging
Frontend:
- React 19 with TypeScript
- Vite for bundling and dev server
- React Router for routing
Important: Always use uv run to execute Python code, not python directly.
The project has three environments controlled by DHUB_ENV (dev | prod | local). The server defaults to dev for safety; the CLI defaults to prod for end users.
Always work against dev unless explicitly told to use prod. Prefix all CLI, server, and deploy commands with DHUB_ENV=dev:
DHUB_ENV=dev dhub list # CLI against dev
DHUB_ENV=dev modal deploy modal_app.py # deploy dev Modal app (from server/)
DHUB_ENV=dev uv run --package decision-hub-server uvicorn ... # local dev server- Dev:
https://hub-dev.decision.ai, config at~/.dhub/config.dev.json, env fileserver/.env.dev - Prod:
https://hub.decision.ai, config at~/.dhub/config.prod.json, env fileserver/.env.prod - Local:
http://localhost:5173, env fileserver/.env.local, infra viadocker-compose-local.yml(Postgres + MinIO). Requires Docker Desktop. Evals spawn to the deployed dev Modal app. When asked to deploy locally, runmake deploy-localin the background (run_in_background: true).
Working directory caveat: Always run server-package commands from server/. The server's .env.dev / .env.prod files live in server/ and pydantic-settings resolves them relative to the current working directory. Running from the repo root will fail with missing settings errors.
# Correct
cd server && DHUB_ENV=dev uv run --package decision-hub-server python -c "..."
cd server && DHUB_ENV=dev modal deploy modal_app.py
# Wrong — .env.dev not found
DHUB_ENV=dev uv run --package decision-hub-server python -c "..."Client-package commands (uv run --package dhub-cli ...) can run from anywhere.
Common commands are available via make. Read the Makefile to see all targets. Always read the Makefile for available targets and their exact recipes — do not hardcode or memorize commands that the Makefile already provides.
Important: Always run make from the repo root (where the Makefile lives). If your working directory has drifted (e.g. into server/), use make -C /path/to/repo-root <target> to avoid "No rule to make target" errors.
Install pre-commit hooks once after cloning: make install-hooks.
- Frozen dataclasses for immutable data models
- Pure functions over classes — use modules to group related functions
- Single responsibility: Small, single-purpose functions with one clear reason to change
- Clear interfaces: Descriptive names, type hints, explicit signatures — obvious inputs, outputs, and behavior
- Domain/infrastructure separation: Keep business logic independent from frameworks, I/O, databases. UI, persistence, and external services are replaceable adapters around a clean core
- Testing as design: Design for fast, focused unit tests. Pure functions and small units guide architecture
- Readability over cleverness: Straightforward, idiomatic Python over opaque tricks. Follow PEP 8
- YAGNI: No abstractions or features "just in case" — add complexity only for concrete needs
- Continuous refactoring: Ship the simplest thing that works, refactor as requirements evolve. Routine maintenance, not heroic effort
- Don't worship backward compatibility: Don't freeze bad designs to avoid breaking changes. Provide clear migration paths instead of stacking hacks
- DRY: Do not repeat yourself — ensure each piece of logic has a single, clear, authoritative implementation instead of being duplicated across the codebase
- Comments: Explain business logic, assumptions, and choices — not the code verbatim
- Isolate side operations: When a non-critical operation (metadata sync, analytics, notifications) runs alongside a critical path, always (1) wrap it in try/except so failures don't block the main flow, and (2) if it uses a separate DB connection, commit pending writes first so they're visible
- Reset state on context changes (React): When a component holds local state (
useState,useRef) and receives changing inputs (URL params, props), audit which state becomes stale and adduseEffect(() => reset(), [key])or use akey=prop to force remount - SQL query correctness: Every query with
LIMITmust have an explicitORDER BYwith a unique tiebreaker, and every field inSELECTmust appear in the returned response dict - Sanitize subprocess credentials: Never pass secrets (tokens, keys) in subprocess command arguments; catch
TimeoutExpiredandCalledProcessErrorand raise sanitized errors that strip credentials - Use server totals, not local array length: After introducing pagination, never use
.lengthof a local array as a display count — always use the server'stotalfield - Don't advance state on failure: For every state-advancing side effect (advancing cursors, clearing errors, updating timestamps), explicitly handle the failure case — do not advance state when the operation it represents has failed
- Clean up after refactors: After every refactor, search for all references to the old function/argument and remove dead code — no orphaned functions, unused imports, or selected-but-unmapped fields
- Mobile-first frontend: Design and test all UI changes for mobile viewports first (target ~400px, e.g. iPhone 17 Pro). Use CSS Modules media queries at
480pxand768pxbreakpoints. Every new page/component must include responsive styles — never ship desktop-only layouts - Consistent font scale: The site uses CSS custom properties for a fixed typographic scale (base 18px), defined in
frontend/src/index.css:root. Always use these variables — never hardcoderemfont sizes:--text-hero(4rem, hero headline),--text-display(2.5rem, large accent),--text-title(2rem, detail page titles),--text-heading(1.6rem, listing page titles),--text-lg(1.3rem, section headings),--text-body(1.1rem, body text/descriptions),--text-sm(0.9rem, secondary labels/nav),--text-xs(0.8rem, metadata/buttons),--text-xxs(0.7rem, badges/tiny labels). Mobile (480px) steps down one level (e.g.--text-title→--text-heading). Never eyeball new sizes — always use the scale variables - Batch scripts must be incremental, parallelizable, and resumable: Backfills and long-running data operations must (1) save results incrementally (flush to DB in batches as work completes, not all at the end), (2) parallelize the expensive work (ThreadPoolExecutor for S3/LLM, single connection for DB writes), and (3) support
--resumeto skip already-processed items after a crash - Keep the ask pipeline in sync with skill metadata: The
_SKILL_SUMMARY_COLUMNSlist indatabase.pyis the single source of truth for which columns appear in skill list and search queries._row_to_skill_summary()usesrow._mappingto automatically capture all selected columns, so adding a column to_SKILL_SUMMARY_COLUMNSmakes it available in bothfetch_all_skills_for_indexandsearch_skills_hybridwithout manual dict updates. However, to surface a new column to the ask LLM and API consumers, you must also update: (1)SkillIndexEntryinmodels.py, (2)build_index_entry()andserialize_index()indomain/search.py, (3)AskSkillRefinsearch_routes.py+ both enrichment paths (main + fallback), (4) frontendAskSkillRefintypes/api.tsandAskModal.tsx
The server uses loguru (from loguru import logger). The client does not — it uses Rich console output directly. Logging is configured once at startup via setup_logging() in decision_hub.logging. Log level is controlled by LOG_LEVEL in server/.env.dev / .env.prod (default: INFO). All output goes to stderr — no log files. A RequestLoggingMiddleware assigns an 8-char request ID to every HTTP request for correlation.
Use {} placeholders, not f-strings — loguru defers evaluation so arguments are only computed when the level is active:
logger.info("Publishing {}/{} version={}", org_slug, skill_name, version_id)Use logger.opt(exception=True) to attach tracebacks — don't format exceptions into the message string.
Log in API/infra layers, not in domain functions. Domain functions return values or raise — the caller decides what to log.
Include greppable identifiers (org, skill, case name, status code) — not just human prose.
Public endpoints use in-memory per-IP sliding-window rate limiters (see rate_limit.py). Limiters are lazily initialized on app.state from settings. Limits are per-container (not shared across Modal replicas). Defaults and setting names are defined in server/src/decision_hub/settings.py (*_rate_limit / *_rate_window fields). Enforcement functions (_enforce_*_rate_limit) live alongside their routes in server/src/decision_hub/api/registry_routes.py, search_routes.py, and auth_routes.py.
All DB queries have a 30s statement_timeout (set in engine connect_args in database.py). Query parameters on public endpoints have max_length constraints to prevent oversized payloads reaching the DB or LLM APIs.
When adding new public endpoints, always add a rate limiter following the existing pattern: define a _enforce_*_rate_limit dependency, add settings in settings.py, and wire it via dependencies=[Depends(...)].
The project uses ruff for linting and formatting, and mypy for type checking, both configured in the root pyproject.toml. Pre-commit hooks run ruff automatically on every commit (install once with make install-hooks). Mypy runs in CI only (not pre-commit). See make help for lint, format, and typecheck targets.
Use pytest with fixtures in conftest.py. Mock external services (S3, Gemini, Anthropic, Database) in tests. See make help for test targets (all, client, server, frontend).
GitHub Actions runs on every PR to main:
- lint: ruff check + format
- typecheck: mypy type checks
- test-client: client pytest suite
- test-server: server pytest suite
- test-shared: shared pytest suite
- test-frontend: frontend vitest suite
- lint-frontend: TypeScript type check + ESLint
- check-migrations: validates migration filename formats and detects duplicates
- migrate-check: replays all SQL migrations from scratch against a fresh Postgres (CI only)
- schema-drift: detects differences between SQL migrations and SQLAlchemy metadata (CI only)
Migrations are tracked SQL files in server/migrations/. The runner (scripts/run_migrations.py) records each applied file in a schema_migrations table so migrations are never re-applied.
Use the migrate targets from the Makefile. Migrations are also applied automatically during deploys (scripts/deploy.sh).
Naming: Use timestamp-based filenames to avoid collisions between parallel branches. Always generate the timestamp from the actual current time — never invent a round number like 120000:
# Generate the filename prefix:
date +%Y%m%d_%H%M%S
# Example output: 20260212_163047YYYYMMDD_HHMMSS_description.sql
Example: 20260212_163047_add_user_email.sql
Legacy files use 3-digit numeric prefixes (001_ through 011_). Do not add new files with numeric prefixes.
- Never use
metadata.create_all()to apply schema changes. Always write a SQL migration file. - Never edit or delete existing migration files. They represent immutable history. Fix mistakes with a new migration.
- Always update both the SQL migration file and the SQLAlchemy table definition in
database.pywhen changing the schema. CI will catch drift. - Use
IF NOT EXISTS/IF EXISTSin DDL when possible for idempotency (especially forCREATE TABLE,ADD COLUMN). - Keep migration PRs focused. Prefer multiple small PRs over one large one.
- Test locally before pushing: use the migration-related targets from the
Makefileto validate filenames and apply to dev. created_atandupdated_atare managed by PostgreSQL.created_atuses aDEFAULT now()server default on insert.updated_atis set by aBEFORE UPDATEtrigger (set_updated_at()). Never set either in application code. New mutable tables must include both columns and aBEFORE UPDATEtrigger forupdated_at.- Always enable RLS on new tables. Every
CREATE TABLEmigration must includeALTER TABLE <name> ENABLE ROW LEVEL SECURITY;. This blocks Supabase PostgREST (anon/authenticated roles) from querying tables directly — all data access must go through the FastAPI API layer. The backend connects as the table owner, which bypasses RLS automatically.
You may deploy to dev when asked. Always use make deploy-dev — never bare modal deploy (it skips the frontend build).
- Never modify
.env.prodor.env.dev— these contain production configuration. The only exception isMIN_CLI_VERSION, which you may bump when removing or breaking CLI-facing endpoints. - Never modify
.github/workflows/without explicit instructions
- Search for existing implementations before writing new code (
grep,glob, checkshared/src/) - Check open PRs for overlapping work:
gh pr list --state open - Link to a GitHub issue. If no issue exists, create one first.
When monitoring long-running processes (crawlers, deploys, Modal logs), never use sleep inside a tool call. A sleep 900 blocks the entire conversation for 15 minutes and makes the agent appear stuck. Instead:
- Run long commands in background (
run_in_background: true) and check output withtailin a separate non-blocking call - Poll with short timeouts — use quick DB count queries or
pschecks, not embedded sleeps - Let the user drive timing — report current status and let them decide when to check again, rather than silently waiting
- For periodic monitoring, write a small background script that appends status to a file, then read that file on demand
When fixing PR review comments, always complete all three steps:
- Fix the code and push the commit
- Reply to each comment explaining the fix
- Resolve the thread via GraphQL:
gh api graphql -f query='mutation { resolveReviewThread(input: {threadId: "THREAD_ID"}) { thread { isResolved } } }'
To get unresolved thread IDs: gh api graphql -f query='{ repository(owner: "pymc-labs", name: "decision-hub") { pullRequest(number: PR_NUM) { reviewThreads(first: 50) { nodes { id isResolved comments(first: 1) { nodes { body author { login } } } } } } } }' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false) | {id, body: .comments.nodes[0].body[:80]}'
When reviewing code (PRs, refactors, or when explicitly asked), evaluate these four dimensions:
- Architecture — Component boundaries, coupling, data flow, scaling characteristics, security boundaries (auth, data access, API surface).
- Code quality — DRY violations (flag aggressively), error handling gaps and missing edge cases, over- or under-engineering relative to the task, technical debt hotspots.
- Test coverage — Unit/integration/e2e gaps, weak assertions, untested error paths and failure modes, missing edge case coverage.
- Performance — N+1 queries and DB access patterns, memory concerns, caching opportunities, high-complexity code paths.
Decision-making protocol for non-trivial changes: When a review finding or implementation choice has multiple valid approaches, do not assume a direction. Instead: (1) describe the problem concretely with file and line references, (2) present 2-3 options including "do nothing" where reasonable, with effort/risk/maintenance tradeoffs for each, (3) give an opinionated recommendation with rationale, and (4) ask for input before proceeding.
- Patch (
0.5.0→0.5.1): Bug fixes, internal refactors. Nothing new for the user, nothing breaks. - Minor (
0.5.0→0.6.0): New features — new commands, new flags, new output. Old CLI still works with the server. - Major (
0.5.0→1.0.0): Breaking changes — old CLI can't talk to the server anymore (changed URLs, new required fields, removed endpoints). Requires server redeploy.
Use the publish-cli target from the Makefile. Supports BUMP=patch|minor|major and BREAKING=1 to sync servers.
The server enforces a minimum CLI version via MIN_CLI_VERSION in server/.env.dev and server/.env.prod. The .env file is baked into the Modal image at deploy time, so a server redeploy is all that's needed. When BREAKING=1 is passed, the script updates MIN_CLI_VERSION in both env files and redeploys both servers.
Every CLI publish automatically creates a cli/vX.Y.Z git tag, and every prod deploy creates a prod/YYYYMMDD-HHMMSS tag. GitHub Actions generates release notes from merged PRs when these tags are pushed.
Use the deploy targets from the Makefile. The deploy script builds the React frontend (frontend/dist/) and bundles it into the Modal container alongside the server.
Dev deploys are manual via make deploy-dev. The .env.dev file is baked into the Modal image at deploy time, so all secrets live in that single file — no Modal secrets or GitHub Environment needed.
Both dev and prod deploys require the corresponding .env file on the deployer's machine (server/.env.dev or server/.env.prod). Rotating secrets requires updating the .env file and redeploying.
make deploy-prod auto-tags prod/YYYYMMDD-HHMMSS and pushes the tag, which triggers release-notes.yml to create a GitHub Release with auto-generated notes from merged PRs. To see what's in prod: git tag --list 'prod/*' --sort=-version:refname | head -5.
After implementing significant changes, check whether these need updating:
README.md— new/changed CLI commands, API endpoints, features, setup requirements, or architecturebootstrap-skills/dhub-cli/SKILL.mdandbootstrap-skills/dhub-cli/references/command_reference.md— new/changed CLI commands, flags, or behavior
See docs/runbook.md for DB access boilerplate, common queries, republishing skills, GitHub App token verification, crawler usage, tracker monitoring, data maintenance/backfills, Modal debugging, and troubleshooting.