Skip to content

fix(localgov-drupal): stop MariaDB client verifying Aurora TLS cert#186

Merged
chrisns merged 1 commit into
mainfrom
fix/localgov-drupal-rds-tls
Apr 21, 2026
Merged

fix(localgov-drupal): stop MariaDB client verifying Aurora TLS cert#186
chrisns merged 1 commit into
mainfrom
fix/localgov-drupal-rds-tls

Conversation

@chrisns
Copy link
Copy Markdown
Member

@chrisns chrisns commented Apr 21, 2026

Summary

Drupal init was silently failing on every new sandbox deploy with Database not ready for 5 minutes then falling back to an uninstalled supervisord state.

Root cause: the MariaDB CLI in the Alpine base image verifies TLS server certs by default and the Amazon RDS CA isn't in its trust store, so every mysql … -e "SELECT 1" probe failed instantly with ERROR 2026 (HY000): TLS/SSL error: Certificate verification failure. The DB was always up; the probe was broken.

Changes

  • Dockerfile: ship /etc/my.cnf.d/rds-ssl.cnf + /root/.my.cnf with ssl-verify-server-cert=OFF. PHP PDO has its own TLS stack and is unaffected; traffic stays inside the VPC.
  • init-drupal.sh: MAX_DB_RETRIES 60→180 and --connect-timeout=3 on the probe so a single hung TCP attempt can't burn the budget.
  • entrypoint.sh: propagate the init script's exit code through tee and exit the container on failure so ECS restarts the task instead of serving supervisord with no Drupal.
  • database.ts: Aurora Serverless v2 max capacity 2→8 ACU so the 33-module first-boot install doesn't peg the cluster for 9 min. Idle cost unchanged (scales back to min).
  • localgov-drupal-stack.ts: bump ReadSecretFn runtime to NODEJS_22_X.

Test plan

  • Confirmed symptom in sandbox 567119267654: TLS/SSL error: Certificate verification failure from MariaDB CLI
  • Confirmed PDO connects fine (different TLS stack)
  • Manually wrote /root/.my.cnf with ssl-verify-server-cert=OFF in the running task, re-launched init-drupal.sh → DB probe succeeded first try, install completed, site up at CloudFront URL
  • Terminate lease, open fresh sandbox, verify deploy is clean end-to-end with new image + template

The MariaDB CLI in the Alpine base image verifies TLS server certs by
default and does not trust the Amazon RDS CA, so every DB probe in
init-drupal.sh failed instantly with "Certificate verification failure"
and the container silently fell back to an uninstalled state.

- Dockerfile: drop /etc/my.cnf.d/rds-ssl.cnf and /root/.my.cnf disabling
  cert verification for the CLI. PHP PDO has its own TLS stack and is
  unaffected. Traffic stays inside the VPC.
- init-drupal.sh: MAX_DB_RETRIES 60 -> 180 and add --connect-timeout=3
  so a single hung TCP attempt cannot burn the retry budget.
- entrypoint.sh: propagate the init script's exit code out of the tee
  pipe and exit the container on failure so ECS restarts the task
  instead of serving supervisord with no Drupal.
- database.ts: Aurora Serverless v2 max capacity 2 -> 8 ACU so the
  33-module first-boot install does not peg the cluster; idle cost
  unchanged (scales back to min).
- localgov-drupal-stack.ts: bump ReadSecretFn to NODEJS_22_X.
@chrisns chrisns added this pull request to the merge queue Apr 21, 2026
Merged via the queue into main with commit ba5479c Apr 21, 2026
5 checks passed
@chrisns chrisns deleted the fix/localgov-drupal-rds-tls branch April 21, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant