Skip to content

Bug report: ansible-navigator on macOS misconfigures SSH agent socket and defaults to an unwritable SSH control path for non-root UIDs #2055

@netopsengineer

Description

@netopsengineer
ISSUE TYPE
  • Bug Report
SUMMARY

On macOS with Docker Desktop, ansible-navigator auto-passes a launchd SSH agent socket path from the host that is unusable inside the Docker VM, so the EE cannot reach the agent. Separately, the EE defaults to an SSH control socket directory under /runner/.ansible/cp that is non-writable for the runtime user UID 501, which causes OpenSSH multiplexing to fail unless the container runs as root or the control dir is moved.

ANSIBLE-NAVIGATOR VERSION
ansible-navigator 25.9.0
CONFIGURATION

Relevant working configuration that demonstrates the needed workaround on macOS:

ansible-navigator.yaml

ansible-navigator:
  execution-environment:
    enabled: true
    container-engine: docker
    image: ghcr.io/ansible/community-ansible-dev-tools:latest (used as my EE base image)
    # Use container-options, not volume-mounts, so macOS host validation does not reject the path
    container-options:
      - "--volume=/run/host-services/ssh-auth.sock:/run/host-services/ssh-auth.sock:ro"
      - "--env=SSH_AUTH_SOCK=/run/host-services/ssh-auth.sock"
    # Optional, belt and suspenders for control path dir
    environment-variables:
      set:
        ANSIBLE_SSH_CONTROL_PATH_DIR: "/tmp/ansible/cp"

ansible.cfg

[ssh_connection]
control_path_dir = /tmp/ansible/cp
LOG FILE

Key excerpts that show the failure and the successful probe afterward.

Agent failure with default Navigator auto-mount on macOS

$ ansible-navigator exec -- /bin/sh -lc 'echo $SSH_AUTH_SOCK; ssh-add -l'
/private/tmp/com.apple.launchd.<ID>/Listeners
Error connecting to agent: Operation not supported

Multiplexing failure when not running the EE as root

muxserver_listen: link mux listener /runner/.ansible/cp/<long> => /runner/.ansible/cp/<short>: Bad file descriptor

Successful auth via agent after workarounds

debug1: Offering public key: ... ED25519 ... agent
debug1: Offering public key: ... RSA ... agent
debug1: Server accepts key: ... RSA ... agent
Authenticated to <host> using "publickey".
STEPS TO REPRODUCE
  1. Host: macOS 15.6.1 with Docker Desktop 4.47.0. No custom Navigator options. EE enabled.

  2. Run any ansible-navigator run ... that requires SSH to a host.

  3. Observe that Navigator auto-injects:

    -v /private/tmp/com.apple.launchd.<ID>/:/private/tmp/com.apple.launchd.<ID>/
    -e SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.<ID>/Listeners
  4. Exec into the EE and check the agent:

    ansible-navigator exec -- /bin/sh -lc 'echo $SSH_AUTH_SOCK; ssh-add -l'

    Result: Operation not supported.

  5. Attempt to configure the correct socket via execution-environment.volume-mounts:

    volume-mounts:
      - src: /run/host-services/ssh-auth.sock
        dest: /run/host-services/ssh-auth.sock
        options: ro

    Navigator rejects it with: Source '/run/host-services/ssh-auth.sock' does not exist.

  6. Instead use container-options:

    --volume=/run/host-services/ssh-auth.sock:/run/host-services/ssh-auth.sock:ro
    --env=SSH_AUTH_SOCK=/run/host-services/ssh-auth.sock

    Now ssh-add -l works.

  7. Run an SSH task as non-root inside the EE. Observe:

    muxserver_listen ... Bad file descriptor
  8. Add in ansible.cfg:

    [ssh_connection]
    control_path_dir = /tmp/ansible/cp

    SSH now works without running the container as root.

EXPECTED RESULTS
  • On macOS, Navigator should make the agent usable in the EE out of the box.
  • SSH should not fail on the first connection due to an unwritable control socket directory when the EE runs as a non-root host UID.
ACTUAL RESULTS
  • Navigator passes a macOS launchd socket path into the container which is not usable inside the Docker VM. Agent operations fail with Operation not supported.
  • When the EE runs as the host UID 501, OpenSSH multiplexing fails because /runner/.ansible/cp is not writable by that UID, producing muxserver_listen ... Bad file descriptor.
ADDITIONAL INFORMATION

Environment

  • Host OS: macOS 15.6.1
  • Docker Desktop: 4.47.0 (206054)
  • Container engine: docker
  • ansible-navigator: 25.9.0
  • EE base image: ghcr.io/ansible/community-ansible-dev-tools:latest
  • EE built with ansible-builder

Inside the EE

  • HOME=/runner
  • Runtime UID is the host UID on macOS, typically 501
  • /runner is created as root:0 with owner and group write only. UID 501 is not in group 0.

Relevant lines from the EE Dockerfile produced by ansible-builder:

RUN mkdir -p /runner && chgrp 0 /runner && chmod -R ug+rwx /runner
WORKDIR /runner
USER 1000

Navigator overrides the image user at runtime to the host UID, so the effective UID is 501 on macOS.

Docs gap

  • The FAQ shows Linux agent sockets like /run/user/1000/... and says Navigator auto-mounts SSH_AUTH_SOCK. On macOS the host socket lives under /private/tmp/com.apple.launchd... and is not consumable inside Docker's VM. Docker Desktop publishes a proxy inside the VM at /run/host-services/ssh-auth.sock. Navigator does not detect or use this by default.
  • execution-environment.volume-mounts validation rejects /run/host-services/ssh-auth.sock on macOS because it does not exist on the host filesystem, even though it is a well-known path inside the Docker VM.

Workarounds verified

  • Map the Docker Desktop proxy socket via container-options:

    --volume=/run/host-services/ssh-auth.sock:/run/host-services/ssh-auth.sock:ro
    --env=SSH_AUTH_SOCK=/run/host-services/ssh-auth.sock
  • Set a writable SSH control path for the non-root UID:

    [ssh_connection]
    control_path_dir = /tmp/ansible/cp

Proposed fixes

  1. macOS agent handling

    • Detect macOS plus Docker Desktop and auto-map SSH_AUTH_SOCK to /run/host-services/ssh-auth.sock for EEs.
    • Avoid passing /private/tmp/com.apple.launchd... into the container.
  2. Settings validation

    • Relax or make OS-aware the validation of execution-environment.volume-mounts.src so that /run/host-services/ssh-auth.sock is allowed on macOS, or provide an allowlist of VM-internal paths for Docker Desktop.
  3. Writable control path by default for non-root

    • When Navigator launches the EE as a non-root host UID, set ANSIBLE_SSH_CONTROL_PATH_DIR=/tmp/ansible/cp unless the user has explicitly configured a value.
    • Document that on macOS the runtime UID is typically 501, which explains why /runner/.ansible/cp is unwritable by default.

Why this matters

  • Without these changes, macOS users cannot rely on the documented "auto agent setup" and often fail on first SSH because of the control path. The fixes are backward compatible for Linux users. On Linux, the existing auto-mount of /run/user/$UID/... continues to work. On macOS, the Docker Desktop proxy path is required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugResearched, reproducible, committed to fix

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions