Skip to content

setup: enforce UMask=0002 on php-fpm + kimaki services so coding-agent writes stay group-writable (WP auto-update keeps failing on stamped 0644 files) #125

@chubes4

Description

@chubes4

Symptom

Every 12h, WordPress sends a "Background Update Failed" email like:

```
FAILED: WordPress failed to update to WordPress 7.0-RC2-62287
Updating to WordPress 7.0-RC2-62287
...
Copying the required files...
Could not copy file. twentytwentyfive/package-lock.json
Installation failed. twentytwentyfive/package-lock.json
Error: [copy_failed_copy_dir_themes] Could not copy file.
twentytwentyfive/package-lock.json
```

On extrachill.com (running on the wp-coding-agents harness), this has been a recurring problem and wasn't tied to the known root-ownership symptom from #93 — file ownership was actually correct (`opencode:www-data`, with `www-data` in the `opencode` supplementary group). Yet the auto-update path running as `www-data` couldn't overwrite the file.

Root cause

The harness sets up the cooperative-write convention (opencode in www-data group, `chmod -R g+w` at setup) but does not enforce a group-friendly umask on the long-running services that write into `$SITE_PATH`.

`wp-config.php` declares the standard:

```php
define('FS_CHMOD_DIR', (0775 & ~ umask()));
define('FS_CHMOD_FILE', (0664 & ~ umask()));
```

But `umask()` is read from the running process. On a stock Debian/Ubuntu install, systemd services run with umask `0022`, so:

  • `FS_CHMOD_FILE` becomes `0664 & ~022 = 0644` (group write stripped)
  • `FS_CHMOD_DIR` becomes `0775 & ~022 = 0755` (group write stripped)

Net effect: every file PHP-FPM (or a coding-agent process) creates lands at mode 0644, silently undoing the `g+w` the harness applied at setup. Once a file is 0644 and owned by `opencode`, the auto-update path (running as `www-data` via `wp-cron.php` over HTTPS) gets `EPERM` even though it's in the group — the group bit isn't writable.

WordPress auto-update is just the canary. Anything that flips between www-data and opencode as the writer (cron, wp-cli from the kimaki agent, plugin-installer, etc.) eventually hits this.

Diagnostic on extrachill.com

```
$ ls -la /var/www/extrachill.com/wp-content/themes/twentytwentyfive/package-lock.json
-rw-r--r-- 1 opencode www-data 54180 May 1 19:05 package-lock.json
^^^ owner can write, group CANNOT, even though group is www-data and we want it to

$ id www-data
uid=33(www-data) gid=33(www-data) groups=33(www-data)
$ id opencode
uid=1000(opencode) gid=1000(opencode) groups=1000(opencode),4(adm),33(www-data)
$ getent group www-data
www-data:x:33:opencode

$ cat /proc/$(pgrep -f "php-fpm: pool" | head -1)/status | grep -i umask
Umask: 0022

$ find /var/www/extrachill.com/wp-content/themes -type f ! -perm -g+w | wc -l
1508
$ find /var/www/extrachill.com/wp-content/plugins -type f ! -perm -g+w | wc -l
534
```

So 2042+ files are mode 0644 in dirs that the harness explicitly told setup to keep group-writable.

What's missing in the harness

`lib/infrastructure.sh` does:

```
useradd -m -s /bin/bash -G www-data "$SERVICE_USER"
chmod -R g+w "$SITE_PATH"
chown -R www-data:www-data "$SITE_PATH"
```

…but never:

  1. Sets `UMask=0002` on the php-fpm systemd unit (so files PHP-FPM creates inherit `0664` / `0775`).
  2. Sets `UMask=0002` on the kimaki systemd unit (so any agent-spawned write — wp-cli, MCP, file abilities — also inherits `0664` / `0775`).

Without (1), WP auto-updates and any plugin that writes during a web request silently produce 0644 files.

Without (2), every wp-cli or filesystem write from a coding-agent session does the same.

Proposed fix

For VPS installs (the scope where the harness manages systemd):

  1. `bridges/kimaki.sh` (kimaki systemd unit template): add `UMask=0002` to `[Service]`.

  2. `lib/infrastructure.sh` (PHP-FPM provisioning): drop a `/etc/systemd/system/php${PHP_VERSION}-fpm.service.d/umask.conf` with:

    ```
    [Service]
    UMask=0002
    ```

    Then `systemctl daemon-reload && systemctl restart php${PHP_VERSION}-fpm`.

  3. After applying the drop-in, the harness should run a one-time `find $SITE_PATH -type f ! -perm -g+w -exec chmod g+w {} +` and the equivalent for dirs, to repair any files already stamped at 0644 by the previous umask-0022 runtime. Otherwise the next auto-update still fails once before the umask change starts paying off.

`upgrade.sh` should idempotently apply the drop-in and the repair pass via `_smart_update_systemd_unit`-style logic so existing installs pick up the fix without re-running setup.

Why this is harness-level, not wp-config / per-site

  • `wp-config.php` already declares the canonical `0664/0775` constants. The harness owns the runtime where those constants get masked, so the harness owns the umask.
  • Asking site owners to chmod manually after every auto-update is the workaround we keep doing and it's exactly what RULES.md says not to do.

Related

Acceptance criteria

  • Fresh `setup.sh` on a clean VPS produces a php-fpm service with `Umask: 0002` (verifiable via `/proc/$(pgrep php-fpm | head -1)/status`).
  • A file written by PHP-FPM into `$SITE_PATH` lands at mode 0664 with group www-data, group-writable.
  • WordPress auto-update from one nightly to the next succeeds on a site whose theme files were last touched by `opencode`.
  • `upgrade.sh` on an existing install applies the drop-in without re-running full setup, and runs the one-time repair pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions