docs: README in every bases/lif/ directory (#922)

bjagg · web-flow · commit 407346cc5b7e · 2026-05-15T19:14:59.000-07:00
## Summary Adds a short README to each base brick so people navigating the repo on GitHub can tell what each service does without opening `core.py`. Each follows the same shape: purpose, endpoints, auth (where relevant), components composed, and the deploying project. ## Coverage 11 new READMEs: - `api_graphql`, `advisor_restapi`, `query_cache_restapi`, `example_data_source_rest_api`, `semantic_search_mcp_server`, `orchestrator_restapi`, `translator_restapi`, `identity_mapper_restapi`, `query_planner_restapi` - `mdr_restapi` — with an endpoint-group table summarizing its 16 endpoint modules under their URL prefixes - `query_cache_module` — a note that this is an empty stub for a future non-HTTP cache interface Skipped: `learner_data_export_api` (only on the open #920 branch; its README will land with that PR). ## Why Reviewer bandwidth constraint means we're chipping away at docs that don't need review. Tier 1 of the README sweep — bases first because they're well-bounded; components batch will follow. ## Test plan - [x] `uv run pre-commit run cspell --files bases/lif/*/README.md` passes - [ ] Spot-check that each README's claimed endpoints and component list match `core.py` after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code)
diff --git a/bases/lif/advisor_restapi/README.md b/bases/lif/advisor_restapi/README.md
@@ -0,0 +1,23 @@
+# `advisor_restapi` — Base
+
+FastAPI base for the LIF Advisor: a conversational interface that lets a user query their own learner data through a LangChain/LangGraph agent. Pairs with the Advisor frontend (`frontends/lif_advisor_app/`).
+
+## Endpoints
+- `POST /login` — demo auth against an in-memory user list; returns access + refresh JWTs
+- `POST /refresh-token` — exchange refresh token for a new access token
+- `GET  /initial-message` — greeting message for a freshly-logged-in user
+- `POST /start-conversation` — kicks off the conversation by loading the user's profile via the agent
+- `POST /continue-conversation` — sends a user message, returns the agent's reply
+- `POST /logout` — clears in-memory session state; agent summarizes the interaction first
+- `GET  /health`
+
+## Auth
+HS256 JWTs minted/validated by `lif.auth.core`. Demo-grade — the user list (`users_db`) and password (`LIF_DEMO_USER_PASSWORD`) are hard-coded for demo purposes; not the self-serve auth path (see `docs/design/cross-cutting/self-serve-tenant-auth.md`).
+
+## Composes
+- `auth` — JWT helpers
+- `langchain_agent` — `LIFAIAgent` wrapping LangChain/LangGraph + memory
+- `logging`
+
+## Deployed as
+`projects/lif_advisor_api/`
diff --git a/bases/lif/api_graphql/README.md b/bases/lif/api_graphql/README.md
@@ -0,0 +1,20 @@
+# `api_graphql` — Base
+
+FastAPI + Strawberry GraphQL base that converts an OpenAPI schema (loaded from the MDR at startup) into a GraphQL schema at runtime. The resulting GraphQL types, filters, enums, and root queries are generated dynamically from the OpenAPI JSON — there are no hand-written `.graphql` files for the data model.
+
+## Endpoints
+- `POST /graphql` — Strawberry-managed GraphQL endpoint (queries + mutations)
+- `GET /graphql` (GraphiQL UI when not authed-and-running-in-prod)
+
+## Auth
+API-key authentication via `ApiKeyAuthMiddleware`. Configured by `GRAPHQL_AUTH__API_KEYS` env var (`key1:client1,key2:client2`). When unset, auth is disabled — fine for local dev, never for deployed envs.
+
+## Composes
+- `api_key_auth` — middleware
+- `lif_schema_config` — env-driven config
+- `logging` — logger setup
+- `mdr_client` — fetches OpenAPI schema at startup
+- `openapi_to_graphql` — the actual OpenAPI → GraphQL generator
+
+## Deployed as
+`projects/lif_graphql_api/`
diff --git a/bases/lif/example_data_source_rest_api/README.md b/bases/lif/example_data_source_rest_api/README.md
@@ -0,0 +1,19 @@
+# `example_data_source_rest_api` — Base
+
+Reference implementation of a non-LIF data source. Exists so the orchestrator's "external adapter" code path has something realistic to integrate with locally; production deployments swap in real SIS/LMS/HR systems via adapter components, not this base.
+
+## Endpoints
+A small set of CRUD-style endpoints over a sample person dataset, gated by `x-key` API-key auth (`require_api_key` dependency). See `core.py` for the current shape — it evolves as new adapter scenarios get demoed.
+
+`GET /health` is exempt from auth.
+
+## Composes
+- `auth` — `verify_token` helper for API-key validation
+- `example_data_source_service` — sample data + business logic
+- `logging`
+
+## Deployed as
+`projects/lif_example_data_source_rest_api/`
+
+## See also
+[`docs/operations/guides/add-data-source.md`](../../../docs/operations/guides/add-data-source.md) walks through using this service end-to-end as the template for adding a custom data source.
diff --git a/bases/lif/identity_mapper_restapi/README.md b/bases/lif/identity_mapper_restapi/README.md
@@ -0,0 +1,26 @@
+# `identity_mapper_restapi` — Base
+
+FastAPI base for the LIF Identity Mapper: stores mappings between a person's identifiers across source systems (e.g., SIS ID ↔ LMS ID ↔ HR ID). Required when a single learner shows up under different identifiers in different systems and the orchestrator needs to know they're the same person.
+
+## Endpoints
+Identity mappings are scoped per `{org_id}/{person_id}`:
+
+- `POST   /organizations/{org_id}/persons/{person_id}/mappings`                  — create a new `IdentityMapping`
+- `GET    /organizations/{org_id}/persons/{person_id}/mappings` (and variants)   — list / fetch mappings
+- `DELETE /organizations/{org_id}/persons/{person_id}/mappings/{mapping_id}`     — delete a mapping (204 on success)
+
+Plus exception handlers translating `DataNotFoundException`, `LIFException`, and validation errors into stable HTTP responses.
+
+## Storage
+Backed by SQL (MariaDB in the reference deployment) via `identity_mapper_storage_sql`. The storage layer is pluggable through the `IdentityMapperStorage` interface; SQLAlchemy is the only implementation today.
+
+## Composes
+- `datatypes` — `IdentityMapping`
+- `exceptions`
+- `identity_mapper_service` — business logic
+- `identity_mapper_storage` — storage interface
+- `identity_mapper_storage_sql` — SQLAlchemy-backed implementation
+- `logging`
+
+## Deployed as
+`projects/lif_identity_mapper_api/` (the API) + `projects/lif_identity_mapper_mariadb/` (the database)
diff --git a/bases/lif/mdr_restapi/README.md b/bases/lif/mdr_restapi/README.md
@@ -0,0 +1,38 @@
+# `mdr_restapi` — Base
+
+FastAPI base for the LIF **Metadata Repository (MDR)**: the control-plane service that holds the LIF schema(s), transformation definitions, value sets, and per-tenant configuration. Most LIF services load their schema and transformation rules from here at startup.
+
+The base is split into many endpoint modules (one per concern) which `core.py` mounts under stable URL prefixes.
+
+## Endpoint groups
+
+| Prefix | Module | What it does |
+|---|---|---|
+| `/datamodels` | `datamodel_endpoints` | LIF data models — Base LIF, Org LIF, target transformation models |
+| `/entities` | `entity_endpoints` | Entity definitions within a data model |
+| `/entity_associations` | `entity_association_endpoints` | Entity-to-entity relationships |
+| `/attributes` | `attribute_endpoints` | Scalar attributes within entities |
+| `/entity_attribute_associations` | `entity_attribute_association_endpoints` | Which attributes belong to which entities |
+| `/inclusions` | `inclusions_endpoints` | Reusable attribute groups (e.g., Contact, Address) |
+| `/value_sets` + `/value_set_values` | `valueset_endpoint`, `value_set_values_endpoint` | Strict + extensible enumerations |
+| `/transformation_groups` | `transformation_endpoint` | JSONata-based source→target transformations |
+| `/value_mappings` | `value_mapping_endpoints` | Code/value crosswalks used during transformation |
+| `/search` | `search_endpoint` | MDR-wide full-text search |
+| `/datamodel_constraints` | `datamodel_constraints_endpoints` | Constraint rules per model |
+| `/import_export` | `import_export_endpoints` | Bulk import/export of MDR content |
+| `/generate_jinja` | `generate_jinja_endpoint` | Template generation for derived schemas |
+| `/tenants` | `tenant_endpoints` | Self-serve tenant lifecycle (#883/#884): provision, workspace listing/selection, invite tokens |
+
+## Auth
+`AuthMiddleware` (from `mdr_auth/core`) supports three principals: API-key (services), Cognito JWT (end users), and legacy HS256 JWT (pre-Cognito callers). The middleware also resolves `request.state.tenant_schema` per request based on Cognito groups + optional workspace-selection cookie — see [`docs/design/cross-cutting/self-serve-tenant-auth.md`](../../../docs/design/cross-cutting/self-serve-tenant-auth.md).
+
+## Composes
+- `datatypes` — common payload shapes
+- `mdr_auth` — auth middleware + JWT/cookie/invite-token helpers
+- `mdr_dto` — wire-format DTOs
+- `mdr_services` — business logic (tenant_service, transformation_service, etc.)
+- `mdr_utils` — config, DB session factory, logger
+
+## Deployed as
+`projects/lif_mdr_api/` (API) + `projects/lif_mdr_database/` (Postgres + Flyway migrations).
+Frontend: `frontends/mdr-frontend/`.
diff --git a/bases/lif/orchestrator_restapi/README.md b/bases/lif/orchestrator_restapi/README.md
@@ -0,0 +1,18 @@
+# `orchestrator_restapi` — Base
+
+FastAPI base for the LIF Orchestrator: receives query plans from the Query Planner, fans out to the configured data-source adapters, gathers + normalizes responses. Long-running work is tracked as `OrchestratorJob`s the caller polls.
+
+## Endpoints
+- `POST /jobs`             — submit an `OrchestratorJobRequest`; returns a job id (`OrchestratorJobRequestResponse`)
+- `GET  /jobs/{job_id}`    — fetch current `OrchestratorJob` state
+- `GET  /health`
+
+A `DELETE /jobs/{job_id}` and `GET /jobs/{job_id}/result` are commented out in `core.py` — they were considered and deferred.
+
+## Composes
+- `datatypes` — `OrchestratorJob` request/response shapes
+- `logging`
+- `orchestrator_service` — actual fan-out, adapter dispatch, response merging
+
+## Deployed as
+`projects/lif_orchestrator_api/`
diff --git a/bases/lif/query_cache_module/README.md b/bases/lif/query_cache_module/README.md
@@ -0,0 +1,7 @@
+# `query_cache_module` — Base (stub)
+
+Empty placeholder base. `core.py` has no content; the brick is registered in pyproject and `__init__.py` re-exports `core`, but no application is mounted.
+
+Likely the seed of a non-HTTP (importable-library) interface to query-cache functionality, kept around so a future project can compose `query_cache_module` directly instead of going through `query_cache_restapi`. Until that lands, treat this as scaffolding.
+
+For the actually-deployed HTTP cache, see [`../query_cache_restapi/`](../query_cache_restapi/).
diff --git a/bases/lif/query_cache_restapi/README.md b/bases/lif/query_cache_restapi/README.md
@@ -0,0 +1,19 @@
+# `query_cache_restapi` — Base
+
+FastAPI base for the LIF Query Cache: caches MongoDB-backed LIF records so the GraphQL API doesn't re-orchestrate the same data on every request. One Query Cache instance runs per organization in the reference deployment (`query-cache-org1`, `-org2`, `-org3`).
+
+## Endpoints
+- `POST /query`  — read cached records for a `LIFQuery` filter
+- `POST /update` — mutate a cached record via a `LIFUpdate` payload
+- `POST /add`    — add a new `LIFRecord`
+- `POST /save`   — bulk-save fragments
+- `GET  /`       — sanity ping
+
+## Composes
+- `datatypes` — `LIFQuery`, `LIFRecord`, `LIFFragment`, `LIFUpdate` shapes
+- `exceptions` — common LIF exception types
+- `logging`
+- `query_cache_service` — the actual cache logic (Mongo-backed)
+
+## Deployed as
+`projects/lif_query_cache_api/`
diff --git a/bases/lif/query_planner_restapi/README.md b/bases/lif/query_planner_restapi/README.md
@@ -0,0 +1,25 @@
+# `query_planner_restapi` — Base
+
+FastAPI base for the LIF Query Planner: takes a `LIFQuery` and decides *how* to fulfill it — which data sources to hit, which fragments come from cache vs. fresh orchestration, how to route the result through any required translations. The GraphQL API delegates to the Query Planner; the planner in turn calls the Query Cache and Orchestrator.
+
+## Endpoints
+- `POST /query`              — synchronous query; returns `List[LIFRecord]`
+- `POST /query_async`        — async variant; returns either records (cache hit) or a `LIFQueryStatusResponse` to poll
+- `GET  /query/{query_id}/status` — poll status of an in-flight async query
+- `POST /update`             — apply a `LIFUpdate`
+- `POST /orchestration/results` — callback endpoint for the Orchestrator to report back when an async job finishes
+- `GET  /`                   — sanity ping
+
+Async polling is bounded by `MIN_POLLING_DELAY_SECONDS` (1) and `MAX_POLLING_DELAY_SECONDS` (16) with `MAX_QUERY_TIMEOUT_SECONDS=60` per the constants in `core.py`.
+
+## Configuration
+The planner reads YAML at startup that describes available information sources (the per-org `information_sources_config.yml` files under `deployments/*/`). One planner instance runs per org in the reference deployment.
+
+## Composes
+- `datatypes` — `LIFQuery`, `LIFRecord`, `LIFUpdate`, planner-side types
+- `exceptions`
+- `logging`
+- `query_planner_service` — `LIFQueryPlannerService` (the actual planning logic)
+
+## Deployed as
+`projects/lif_query_planner_api/`
diff --git a/bases/lif/semantic_search_mcp_server/README.md b/bases/lif/semantic_search_mcp_server/README.md
@@ -0,0 +1,24 @@
+# `semantic_search_mcp_server` — Base
+
+FastMCP-based [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server that exposes LIF data via AI-callable tools. Designed for Claude, Cursor, and other MCP-aware clients to do semantic search over learner data fields without learning the GraphQL schema by hand.
+
+## MCP tools
+- `lif_query`    — semantic search over LIF data fields (translates natural-language fragments into GraphQL queries)
+- `lif_mutation` — update LIF data fields (only registered when the schema includes a mutation model)
+
+## HTTP endpoints (Starlette-mounted)
+- `GET  /health`         — readiness check
+- `GET  /schema/status`  — current schema source (`mdr` or `file`), leaf count, root types, filter models
+- `POST /schema/refresh` — reload schema from MDR (state only — does not re-register MCP tools)
+
+## Schema loading
+At startup, loads the OpenAPI schema from MDR (with optional file fallback per `LIFSchemaConfig`). Uses `SchemaStateManager` to track source + provide thread-safe access. **No silent fallback in production:** if MDR is configured but unreachable, startup fails loudly rather than serving stale schema.
+
+## Composes
+- `lif_schema_config` — config + defaults
+- `logging`
+- `schema_state_manager` — schema lifecycle + refresh
+- `semantic_search_service` — `run_semantic_search`, `run_mutation`
+
+## Deployed as
+`projects/lif_semantic_search_mcp_server/`
diff --git a/bases/lif/translator_restapi/README.md b/bases/lif/translator_restapi/README.md
@@ -0,0 +1,17 @@
+# `translator_restapi` — Base
+
+FastAPI base for the LIF Translator: transforms data from a source schema into a target schema using transformation definitions stored in the MDR. Used by the orchestrator to convert raw source-system payloads into LIF-shaped fragments (and, in the other direction, by the Learner Data Export microservice to project LIF data into external formats).
+
+## Endpoints
+- `POST /translate/source/{source_schema_id}/target/{target_schema_id}` — body is the input payload; response is the translated output
+- `GET  /health`
+
+Plus a set of `@app.exception_handler` registrations that convert internal exceptions (`LIFException`, `ResourceNotFoundException`, `RequestValidationError`, etc.) into proper HTTP status codes with stable error envelopes.
+
+## Composes
+- `exceptions` — common LIF exception types
+- `logging`
+- `translator` — `TranslatorConfig`, `Translator` — the actual transformation engine
+
+## Deployed as
+`projects/lif_translator_api/`
diff --git a/cspell.json b/cspell.json
@@ -191,6 +191,7 @@
         "sqlmodel",
         "sqls",
         "stateu",
+        "Starlette",
         "Streamable",
         "streamlit",
         "subcomponent",