Skip to content

Commit b13b5c3

Browse files
feat: update documentation
1 parent 83f5786 commit b13b5c3

2 files changed

Lines changed: 8 additions & 8 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ GitHub: `https://github.com/co-cddo/octo-observability-compliance-scraper`
1313
- **Express server** (`src/server/`) — GOV.UK Frontend UI with SSO auth
1414
- **pg-boss worker** (`src/worker/`) — PostgreSQL-backed job queue, runs alongside the server in a single process via `src/main.ts`
1515
- **Scraper** (`src/scraper/`) — Playwright + Bedrock extraction pipeline
16-
- **Insights** (`src/insights/`) — text-to-SQL chatbot using Bedrock Converse API (Sonnet 4.6)
16+
- **Insights** (`src/insights/`) — text-to-SQL chatbot using Bedrock Converse API (Claude Haiku 4.5)
1717

1818
The daily cron (2 AM London) enqueues one job per service. Jobs are processed sequentially (one Chromium instance at a time). Manual triggers available via `/trigger` (all services), `/services/:slug/trigger` (single service), and `/services/:slug/trigger/:type` (single service, specific scrape type).
1919

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ Scrapes UK government digital service websites to extract reported compliance da
2828
- Docker (for local Postgres)
2929
- AWS credentials with Bedrock access (see [Bedrock setup](#bedrock-setup) below)
3030
- DSIT Internal Access SSO credentials (for the web UI)
31-
- [gitleaks](https://github.com/gitleaks/gitleaks) (optional, for pre-commit secret scanning)`brew install gitleaks` on macOS
31+
- [gitleaks](https://github.com/gitleaks/gitleaks) for pre-commit secret scanning — `brew install gitleaks` on macOS
3232

3333
## Setup
3434

@@ -41,8 +41,8 @@ cp .env.example .env
4141
# Edit .env with your settings (see Environment variables below)
4242

4343
docker compose up postgres -d
44-
npm run db:migrate
45-
npm run db:seed
44+
pnpm run db:migrate
45+
pnpm run db:seed
4646
```
4747

4848
> **Note:** `.npmrc` sets `ignore-scripts=true` to prevent install-time script execution.
@@ -56,9 +56,9 @@ npm run db:seed
5656
## Running locally
5757

5858
```bash
59-
npm run dev
60-
npm run dev:watch
61-
npm run build && npm start
59+
pnpm run dev
60+
pnpm run dev:watch
61+
pnpm run build && pnpm start
6262
```
6363

6464
Opens at [http://localhost:3000](http://localhost:3000)
@@ -88,7 +88,6 @@ GitHub Actions workflows run on every PR and on push to `main`:
8888

8989
- **PR checks:** gitleaks, commitlint, ESLint, unit tests, Playwright e2e, Docker build
9090
- **Deploy (push to main):** lint + test + e2e, then build and push image to GitHub Container Registry (`ghcr.io/co-cddo/octo-observability-compliance-scraper`)
91-
- **Release Please:** automatically creates release PRs from conventional commits, bumps `package.json` version and generates a changelog
9291

9392
No GitHub secrets or variables are required — all workflows use the built-in `GITHUB_TOKEN`.
9493

@@ -147,6 +146,7 @@ server/app.ts (Express + GOV.UK Frontend)
147146
| `no_link_found` | No relevant link found in page footer |
148147
| `scrape_error` | Navigation failed, auth wall, or CAPTCHA detected |
149148
| `bedrock_error` | Page found but Bedrock call or JSON parsing failed |
149+
| `no_data_extracted` | Page found but Bedrock returned empty/no structured data |
150150

151151
Results are **append-only** — each run adds a new row. The UI always shows the latest result per service.
152152

0 commit comments

Comments
 (0)