Skip to content

Commit 9398c7d

Browse files
committed
release v0.5.7: agency mappings, schema guard, changelog
1 parent 4763820 commit 9398c7d

3 files changed

Lines changed: 105 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Changelog
2+
3+
## [0.5.7] — Unreleased
4+
5+
### Added
6+
- **Schema guard assertions** in `harvest.sql`: validates output column names and order against `['source', 'name', 'description', 'status', 'agency', 'tags', 'region', 'url', 'publishdate', 'expirydate']` using `information_schema.columns`, and row count range (10–50,000). Mismatch calls `error()`, failing the job before MySQL mirror.
7+
- **EngagementHQ agency mappings**: added tag-based rules for `dtmi`, `taxi`, `charter`, `on-demand`, `passenger transport` → Department of Transport; `hvs`, `heavy vehicle` → Main Roads Western Australia. Resolves 4 previously-unmapped projects falling through to "Government of Western Australia".
8+
9+
### Changed
10+
- Simplified CSV report job (`justfile`, `ci-nightly.yaml`): removed in-cluster `duckdb` execution; dumps CSV directly from MariaDB via `COPY`.
11+
12+
## [0.5.6] — 2026-05-11
13+
14+
### Changed
15+
- **Renamed chart values**: `mysql``db`, `mariadb-credentials` secret → `db-credentials`, `MARIADB_USER`/`MARIADB_PASSWORD` keys → `DB_USER`/`DB_PASSWORD`. The `harvest.sql` template reference changed from `.Values.mysql.table` to `.Values.db.table`.
16+
- New `db.user` and `db.password` values for external database credentials.
17+
18+
## [0.5.5] — 2026-05-11
19+
20+
### Added
21+
- **Optional bundled MariaDB**: `mariadb.enabled` flag gates StatefulSet, Service, and NetworkPolicy. When disabled, only the external-database secret keys are rendered (`DB_USER`/`DB_PASSWORD` without `MARIADB_ROOT_PASSWORD`).
22+
- NetworkPolicy now also gated on `mariadb.enabled`.
23+
24+
## [0.5.4] — 2026-05-11
25+
26+
### Added
27+
- **Dockerfile**: multi-stage build pinned to `duckdb/duckdb:1.5.2`, pre-installs `httpfs` and `mysql` extensions at build time, runs as non-root (`USER 1000:1000`).
28+
- **`build-image.yaml` workflow**: builds and pushes container images to GHCR on `main` push (tags `:edge`) and on version tags (`:v0.5.x-duckdb152`).
29+
- **`just bump-version`**: updates `Chart.yaml` version/appVersion and `Dockerfile` ARG in one command.
30+
- **`just docker-build` / `docker-build-release`**: multi-arch buildx commands with auto-derived image tags from chart metadata.
31+
- **`_helpers.tpl`**: `harvest-consultations.harvestImageTag` template computes image tag from chart version + DuckDB short version.
32+
33+
### Changed
34+
- **Removed `INSTALL httpfs; INSTALL mysql;`** from `harvest.sql`: extensions now pre-installed in the Docker image, so the pipeline skips install at runtime (tightens `autoinstall_known_extensions = false` lock-down).
35+
- **Removed `HOME=/tmp` env and `duckdb-extensions` emptyDir volume** from cronjob: extensions no longer need writable storage at runtime.
36+
- **Removed `just run`** (local DuckDB execution); pipeline now runs exclusively in-cluster.
37+
- **Image tag auto-derived**: `cronjob.yaml` uses `harvestImageTag` helper instead of a hardcoded `.Values.harvest.image.tag`.
38+
- **`just helm-package`** now accepts a `version` parameter, used by release workflow.
39+
- **Release workflow**: uses `just helm-package` instead of inline `sed` + `helm package`.
40+
- **Removed manual `just` install** from CI workflows; `just` is now provided by `mise`.
41+
42+
## [0.5.3] — 2026-05-11
43+
44+
### Changed
45+
- **Output schema simplified**: dropped `id` and `loaded_at` columns from `consultations_final`. Column order changed to `source, name, description, status, agency, tags, region, url, publishdate, expirydate` (10 columns). `tags` moved after `agency`.
46+
- **Templated target table**: `configmap.yaml` switched from raw `.Files.Get` to `tpl (.Files.Get …)`, enabling `{{ .Values.mysql.table }}` in `harvest.sql`. The MySQL mirror table is now configurable via `mysql.table` (default: `consultations`).
47+
- **CI triggers**: `ci-nightly.yaml` now also runs on push to `main` (previously only cron + manual dispatch).
48+
- **`justfile` cleanup**: removed local `run` target, parameterized `mysql.table`, switched from hardcoded `helmHost` to `mysqlHost`.
49+
50+
## [0.5.0] — 2026-05-09
51+
52+
### Security
53+
- **9 security hardening fixes**:
54+
- Container runs as non-root (`runAsUser: 1000`, `runAsGroup: 1000`)
55+
- Read-only root filesystem (`readOnlyRootFilesystem: true`)
56+
- Seccomp profile set to `RuntimeDefault`
57+
- All capabilities dropped (`drop: ["ALL"]`)
58+
- Community extensions locked (`allow_community_extensions = false`)
59+
- Auto-install/autoload disabled (`autoinstall_known_extensions = false`, `autoload_known_extensions = false`)
60+
- HTTP logging disabled (`enable_http_logging = false`)
61+
- Unredacted secrets disabled (`allow_unredacted_secrets = false`)
62+
- ETag checks disabled for EngagementHQ pages (`unsafe_disable_etag_checks = true`)
63+
64+
## [0.4.5] — 2026-05-09
65+
66+
### Changed
67+
- **Complete rewrite**: replaced Python/SQLMesh/uv harvest pipeline with a pure DuckDB SQL pipeline packaged as a Helm chart.
68+
- HTTP fetch and JWT token extraction done entirely in SQL via DuckDB `httpfs` extension.
69+
- Added CI/CD workflows for nightly end-to-end tests and Helm chart releases.

chart/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@ apiVersion: v2
22
name: harvest-consultations
33
description: DuckDB harvest pipeline for WA government consultation data
44
type: application
5-
version: 0.5.6
5+
version: 0.5.7
66
appVersion: "1.5.2"

chart/harvest.sql

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -194,9 +194,19 @@ SELECT
194194
THEN 'Department of Primary Industries and Regional Development'
195195
WHEN url ILIKE '%haveyoursaywa.engagementhq.com%'
196196
THEN 'Department of Planning, Lands and Heritage'
197-
WHEN "parent-id" = '38135' OR tags ILIKE '%dot%'
197+
WHEN "parent-id" = '38135'
198+
OR tags ILIKE '%dot%'
199+
OR tags ILIKE '%dtmi%'
200+
OR tags ILIKE '%taxi%'
201+
OR tags ILIKE '%charter%'
202+
OR tags ILIKE '%on-demand%'
203+
OR tags ILIKE '%passenger transport%'
198204
THEN 'Department of Transport'
199-
WHEN "parent-id" = '37726' OR tags ILIKE '%mrwa%' OR tags ILIKE '%main roads%'
205+
WHEN "parent-id" = '37726'
206+
OR tags ILIKE '%mrwa%'
207+
OR tags ILIKE '%main roads%'
208+
OR tags ILIKE '%hvs%'
209+
OR tags ILIKE '%heavy vehicle%'
200210
THEN 'Main Roads Western Australia'
201211
WHEN "parent-id" = '38267' OR tags ILIKE '%metronet%' THEN 'METRONET'
202212
WHEN "parent-id" = '37724' OR tags ILIKE '%westport%' THEN 'Westport'
@@ -237,6 +247,29 @@ FROM consultations_final
237247
GROUP BY source, status
238248
ORDER BY source, status;
239249

250+
-- ============================================================================
251+
-- 5b. Schema guard: fail the pipeline if output shape or row count regresses
252+
-- ============================================================================
253+
SELECT CASE
254+
WHEN columns = ['source', 'name', 'description', 'status', 'agency', 'tags', 'region', 'url', 'publishdate', 'expirydate']
255+
THEN 'schema: ok'
256+
ELSE error('Schema mismatch. Expected 10 columns: source,name,description,status,agency,tags,region,url,publishdate,expirydate. Got: ' || array_to_string(columns, ','))
257+
END AS schema_check
258+
FROM (
259+
SELECT array_agg(column_name ORDER BY ordinal_position) AS columns
260+
FROM information_schema.columns
261+
WHERE table_name = 'consultations_final'
262+
);
263+
264+
SELECT CASE
265+
WHEN cnt >= 10 AND cnt <= 50000
266+
THEN 'rowcount: ok'
267+
ELSE error('Row count out of bounds: ' || cnt || ' (expected 10-50000)')
268+
END AS rowcount_check
269+
FROM (
270+
SELECT count(*) AS cnt FROM consultations_final
271+
);
272+
240273
-- ============================================================================
241274
-- 6. Mirror to MySQL using env vars (MYSQL_HOST, MYSQL_USER, MYSQL_PWD, MYSQL_DATABASE)
242275
-- Matches old/harvest.py export behaviour: replace the whole output table.

0 commit comments

Comments
 (0)