Conversation
| # /etc/resolv.conf is a bind mount inside containers, so the default atomic | ||
| # rename fails with EBUSY. Write in place instead. (The symlink case above | ||
| # has already been replaced with a regular file.) | ||
| unsafe_writes: true |
There was a problem hiding this comment.
Resolv repointed before dnsmasq
Medium Severity
On the dnsmasq path, /etc/resolv.conf is switched to nameserver 127.0.0.1 immediately after templating, while Restart dnsmasq only runs at the later flush_handlers in consul.yml. Until then, general DNS can fail if the package did not already leave dnsmasq listening on localhost.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b16da70. Configure here.
There was a problem hiding this comment.
We install dnsmasq from apt, so we can consider it is running and listening from the start.
| @@ -0,0 +1,45 @@ | |||
| # !!! MANAGED BY ANSIBLE !!! | |||
There was a problem hiding this comment.
It seems to me that we lost encryption from former role
in the template
{% if consul_encrypt_enable | bool %}
"encrypt": "{{ consul_raw_key }}",
"encrypt_verify_incoming": {{ consul_encrypt_verify_incoming | bool | to_json }},
"encrypt_verify_outgoing": {{ consul_encrypt_verify_outgoing | bool | to_json }},
{% endif %}
And in the config in dev/staging/prod
"encrypt": "REDACTED",
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
There was a problem hiding this comment.
You're right, I fixed this and was able to test it on dev.
The conf file now contains all parts we can see in our legacy instances.
| @@ -0,0 +1,45 @@ | |||
| # !!! MANAGED BY ANSIBLE !!! | |||
| datacenter = "{{ consul_datacenter }}" | |||
There was a problem hiding this comment.
Same issue as before for the performance configuration
"performance": {
"leave_drain_time": "5s",
"raft_multiplier": 1,
"rpc_hold_timeout": "7s"
},
The raft multiplier value is a setup and not a consul default (would be 5). Maybe it is worth testing it before dropping
| consul_enable_local_script_checks: false | ||
|
|
||
| # Gossip encryption (supply consul_raw_key via vault to enable) | ||
| consul_encrypt_enable: false |
There was a problem hiding this comment.
The old default was true. I think we will have a compat issue if we keep this default
consul_encrypt_enable: "{{ lookup('env', 'CONSUL_ENCRYPT_ENABLE') | default(true, true) }}"
There was a problem hiding this comment.
Yes, when switching to this new role, we will need to set consul_encrypt_enable to true in the inventories/playbooks to avoid issue.
This is weird the default value was true as it breaks consul if none of the encryption related conf are setup.
Now at least, by default encryption is not set and consul is not breaking.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8453f0d. Configure here.
| # /etc/resolv.conf is a bind mount inside containers, so the default atomic | ||
| # rename fails with EBUSY. Write in place instead. (The symlink case above | ||
| # has already been replaced with a regular file.) | ||
| unsafe_writes: true |
There was a problem hiding this comment.
dnsmasq not ensured running
Medium Severity
On the dnsmasq resolver path, the role installs and templates dnsmasq and only restarts it via a handler when the config template changes. Unlike consul, there is no task that ensures dnsmasq is started and enabled on every run, so a stopped service stays down after idempotent applies and Consul DNS on the host can remain broken while the play still succeeds.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 8453f0d. Configure here.
There was a problem hiding this comment.
We install dnsmasq via apt which ensure dnsmasq service is started initially and enabled.
| register: consul_leader | ||
| until: consul_leader.content | length > 2 | ||
| retries: 30 | ||
| delay: 2 |
There was a problem hiding this comment.
Leader wait blocks partial bootstrap
Medium Severity
Every Consul node waits until the HTTP leader endpoint returns a non-empty leader, with a bounded retry loop. When bootstrap_expect exceeds the number of servers already running—common if servers are applied serially or in waves—the cluster will not elect a leader in time and the role fails even though later passes would succeed.
Reviewed by Cursor Bugbot for commit 8453f0d. Configure here.
There was a problem hiding this comment.
This will only fail if we bootstrap a cluster (no actual leader) with a bootstrap_expect > number of nodes actually starting.
The only workaround here is to remove leader election check from role as it can happen after node provisioning.
I'd prefer letting it this way, the role will fail if we bootstrap cluster with a expect=3 and serial=1 for example but will ensure each node is actually joining a working cluster in all other cases.
In the case of bootstraping a cluster, we just could play with the expect value to make it work.


To facilitate debian13/ansible13 migration, we need to replace our external dependency to
ansible-community/consulrole.This new role will replace it and install consul server or agent like the legacy one.
To facilitate migration, we kept majority of variables names.
Note
Medium Risk
New role touches cluster formation, systemd, and host DNS/resolv.conf on the dnsmasq path; mistakes could affect service discovery or resolution, though impact is limited to playbooks that adopt this role.
Overview
Introduces
wazo.consul, a full Ansible role meant to replace the externalansible-community/consuldependency for Debian 11–13 / Ansible 8–13 migration. Most legacy variable names are preserved for drop-in migration.Install & cluster: Consul comes from the HashiCorp APT repo (optional version pin / upgrade), with
consul.hclrendered for server, bootstrap, or client, including optional gossip encryption, static or cloudretry_join, andbootstrap_expectcomputed from server hosts in inventoryallwhen unset (legacy semantics).Runtime fixes: A systemd drop-in forces
Type=simpleonconsul.serviceto avoid single-nodeType=notifytimeouts from the package unit. The role starts Consul and waits on the HTTP API and leader election.DNS: At runtime it picks
systemd-resolvedwhen that service is running (Consul domain routed viaresolved.conf.d); otherwise it installs dnsmasq, forwards the Consul domain, and may rewrite/etc/resolv.conf(includingunsafe_writesfor container bind mounts). There is noconsul_dnsmasq_enabletoggle.Quality gate: Adds pre-commit/ansible-lint tooling, a Jenkins pipeline (
toxlinters + parallel Molecule on ECR systemd images), and testinfra checks keyed offCONSUL_DNS_RESOLVERfor both resolver paths.Reviewed by Cursor Bugbot for commit 8453f0d. Bugbot is set up for automated code reviews on this repo. Configure here.