AI usage in Future Horizons ASPT is deliberately narrow: one feature
(/api/chat → AI Assistant), one provider (Anthropic), one file
(server/src/services/ai-assistant.ts). This document pins the policy.
- Provider: Anthropic (
@anthropic-ai/sdk). - Model:
claude-sonnet-4-20250514(pinned asMODELconstant). - Version bumps: open a PR that updates the constant and re-runs the prompt evaluation suite (see "Evaluation" below).
| Limit | Value | Where |
|---|---|---|
max_tokens output |
1024 | ai-assistant.ts MAX_TOKENS |
| Request timeout | 30 s | ai-assistant.ts REQUEST_TIMEOUT_MS (AbortController) |
Input message size |
1–2000 chars | chat.schema.ts sendMessageSchema |
| History included | last 20 msgs | ai-assistant.ts MAX_HISTORY_MESSAGES |
| Per-user rate limit | 20 req/min | chat.router.ts chatLimiter |
| Request body size | 1 MB total | express.json({ limit: '1mb' }) |
Changing any of these is an explicit governance action — not a config value, not an env var. Edit the source and open a PR with justification.
There is one system prompt, defined inline in ai-assistant.ts as
SYSTEM_PROMPT. It instructs the model to:
- Act as a HERM procurement expert.
- Cite HERM capability codes (e.g. BC011, BC086).
- Give balanced, evidence-based vendor assessments.
- Never recommend a specific vendor.
- Keep responses concise and structured.
At call time, the prompt is augmented with a dynamic "Current platform context" block containing the top 21 systems and their HERM coverage %. This is deterministic (no user-controlled content) so it cannot be used as an injection vector by other users.
When adding a second prompt, move SYSTEM_PROMPT into a dedicated
services/ai/prompts.ts and tag each prompt with a stable ID
(assistant.system.v1, assistant.system.v2).
POST /api/chatrequires a valid JWT. Anonymous LLM access was removed during the production-readiness pass; see ARCHITECTURE_NOTES.md for the route auth matrix.- Per-user rate limit of 20 requests / minute throttles both casual abuse and misconfigured scripts.
- All calls log the authenticated
userId— any suspicious pattern (high token use, repetitive sessions) is traceable.
- User messages are persisted to
chatMessagein the database. If the application becomes subject to strict PII rules, add an opt-in redact-on-write step before theprisma.chatMessage.createcall. - Logs redact
authorization,cookie,password,token,apiKey, andsecretfields via pino redact paths. The message content is not redacted in logs by default — review before logging full bodies. - The system prompt and HERM context do not include user data from other tenants, so cross-tenant data leakage via context is not a risk by design.
- Every successful AI call logs
{ model, inputTokens, outputTokens, durationMs }. Aggregate onuserIdorsessionIdto detect runaway cost. - A per-user daily / monthly cap is not implemented; add one when the
platform accepts self-service signups from untrusted users. Recommended:
a Redis counter keyed by
{ userId, YYYY-MM-DD }, checked inchat.router.tsbefore calling the service.
| Condition | Behaviour |
|---|---|
ANTHROPIC_API_KEY unset (dev) |
Service returns a static fallback string. |
ANTHROPIC_API_KEY unset (prod) |
App fails to start — caught at module init. |
| Anthropic API error | Logged with err, durationMs; 500 surfaces to caller. |
| 30 s timeout exceeded | AbortController aborts; error propagates. |
| Rate limit hit | 429 from express-rate-limit with typed body. |
No automated prompt-eval suite is wired up yet. When you add one, store
cases under server/src/services/ai-assistant.eval/ and run them in CI.
The audit trail today is the chatMessage table plus the structured logs.
To produce a per-user audit report, join those by sessionId and
correlate with the login audit log (TBD).
Anthropic announces model retirements in advance. The process:
- Update
MODELto the replacement (e.g. Claude 4.6 → Claude 4.7). - Run the evaluation suite; compare output quality on the canonical cases.
- Deploy to a staging environment for at least 24 h.
- Roll to production. Users see no change in the UI.