- Schema guard assertions in
harvest.sql: validates output column names and order against['source', 'name', 'description', 'status', 'agency', 'tags', 'region', 'url', 'publishdate', 'expirydate']usinginformation_schema.columns, and row count range (10–50,000). Mismatch callserror(), failing the job before MySQL mirror. - EngagementHQ agency mappings: added tag-based rules for
dtmi,taxi,charter,on-demand,passenger transport→ Department of Transport;hvs,heavy vehicle→ Main Roads Western Australia. Resolves 4 previously-unmapped projects falling through to "Government of Western Australia".
- Simplified CSV report job (
justfile,ci-nightly.yaml): removed in-clusterduckdbexecution; dumps CSV directly from MariaDB viaCOPY.
- Renamed chart values:
mysql→db,mariadb-credentialssecret →db-credentials,MARIADB_USER/MARIADB_PASSWORDkeys →DB_USER/DB_PASSWORD. Theharvest.sqltemplate reference changed from.Values.mysql.tableto.Values.db.table. - New
db.useranddb.passwordvalues for external database credentials.
- Optional bundled MariaDB:
mariadb.enabledflag gates StatefulSet, Service, and NetworkPolicy. When disabled, only the external-database secret keys are rendered (DB_USER/DB_PASSWORDwithoutMARIADB_ROOT_PASSWORD). - NetworkPolicy now also gated on
mariadb.enabled.
- Dockerfile: multi-stage build pinned to
duckdb/duckdb:1.5.2, pre-installshttpfsandmysqlextensions at build time, runs as non-root (USER 1000:1000). build-image.yamlworkflow: builds and pushes container images to GHCR onmainpush (tags:edge) and on version tags (:v0.5.x-duckdb152).just bump-version: updatesChart.yamlversion/appVersion andDockerfileARG in one command.just docker-build/docker-build-release: multi-arch buildx commands with auto-derived image tags from chart metadata._helpers.tpl:harvest-consultations.harvestImageTagtemplate computes image tag from chart version + DuckDB short version.
- Removed
INSTALL httpfs; INSTALL mysql;fromharvest.sql: extensions now pre-installed in the Docker image, so the pipeline skips install at runtime (tightensautoinstall_known_extensions = falselock-down). - Removed
HOME=/tmpenv andduckdb-extensionsemptyDir volume from cronjob: extensions no longer need writable storage at runtime. - Removed
just run(local DuckDB execution); pipeline now runs exclusively in-cluster. - Image tag auto-derived:
cronjob.yamlusesharvestImageTaghelper instead of a hardcoded.Values.harvest.image.tag. just helm-packagenow accepts aversionparameter, used by release workflow.- Release workflow: uses
just helm-packageinstead of inlinesed+helm package. - Removed manual
justinstall from CI workflows;justis now provided bymise.
- Output schema simplified: dropped
idandloaded_atcolumns fromconsultations_final. Column order changed tosource, name, description, status, agency, tags, region, url, publishdate, expirydate(10 columns).tagsmoved afteragency. - Templated target table:
configmap.yamlswitched from raw.Files.Gettotpl (.Files.Get …), enabling{{ .Values.mysql.table }}inharvest.sql. The MySQL mirror table is now configurable viamysql.table(default:consultations). - CI triggers:
ci-nightly.yamlnow also runs on push tomain(previously only cron + manual dispatch). justfilecleanup: removed localruntarget, parameterizedmysql.table, switched from hardcodedhelmHosttomysqlHost.
- 9 security hardening fixes:
- Container runs as non-root (
runAsUser: 1000,runAsGroup: 1000) - Read-only root filesystem (
readOnlyRootFilesystem: true) - Seccomp profile set to
RuntimeDefault - All capabilities dropped (
drop: ["ALL"]) - Community extensions locked (
allow_community_extensions = false) - Auto-install/autoload disabled (
autoinstall_known_extensions = false,autoload_known_extensions = false) - HTTP logging disabled (
enable_http_logging = false) - Unredacted secrets disabled (
allow_unredacted_secrets = false) - ETag checks disabled for EngagementHQ pages (
unsafe_disable_etag_checks = true)
- Container runs as non-root (
- Complete rewrite: replaced Python/SQLMesh/uv harvest pipeline with a pure DuckDB SQL pipeline packaged as a Helm chart.
- HTTP fetch and JWT token extraction done entirely in SQL via DuckDB
httpfsextension. - Added CI/CD workflows for nightly end-to-end tests and Helm chart releases.