Large README update

spacelama · spacelama · commit 7e3e266befaa · 2025-05-23T00:46:52.000+10:00
diff --git a/README.md b/README.md
@@ -1,73 +1,140 @@
-# Ansible Playbook for setting up a proxmox cluster with some debian and (LXC) container VMs, some tasmota IOT devices, and some Openwrt Access Points and Routers with VLAN separation of the SSIDs on each of the radios
+# Ansible Playbook for Proxmox, Debian (LXC) Containers, VMs, Physical nodes, Tasmota IOT, OpenWRT Routers and Access Points
 
-A couple of [Ansible](http://docs.ansible.com/) playbooks which runs a series of configuration steps to set up an SOE based on Debian, in order to provide a solid foundation for subsequent actions.
+## Introduction
+
+### What are we?
+
+We're a couple of [Ansible](http://docs.ansible.com/) playbooks which
+runs a series of configuration steps to set up a SOE based on Debian,
+in order to provide a solid foundation for subsequent actions outside
+the control of these playbooks.  We can provide optional ***(partial)
+configuration of a Proxmox cluster, manage Debian (LXC) containers,
+VMs, and physical hardware (including some hardware management),
+configuration of Tasmota IOT devices, and some OpenWRT Access Points
+and Routers configurable with VLAN separation of the SSIDs on each of
+the radios***.
 
 It borrows heavily from the work of: [Luke Harvey](https://github.com/lukeharvey), [Bryan Kennedy](https://plusbryan.com/my-first-5-minutes-on-a-server-or-essential-security-for-linux-servers), [Ryan Eschinger](http://ryaneschinger.com/blog/securing-a-server-with-ansible/),  [Ashley Rich](https://github.com/A5hleyRich/wordpress-ansible), and [Digital Ocean](https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-14-04)
 
 It will perform the following:
-* Create a new ansible "super user" with with root privileges and public key authentication on your proxmox and debian machines as well as the containers
+* Setup (not initial deploy)/manage Debian 10 (Buster), 11 (Bullseye), 12 (Bookworm) (and Proxmox 6,7,8) physical machines, LXC and VM (Proxmox) instances
+* Create a new Ansible "super user" with with root privileges and public key authentication on your Proxmox and Debian machines as well as the containers
 * Set root user credentials, and a primary user credentials, ssh keys, sudoers to root
 * Implement several SSH hardening techniques
 * Configure the timezone and enable time synchronization
 * Modify the hostname and hosts file
 * Install the admin's account with some nice .bashrc settings
 * Install a package baseline appropriate for laptops, desktops, virtualisation hosts, containers, as appropriate
-* Override some of debian's more annoying defaults
-* Manage tasmota settings on your IOT devices
-* Manage openwrt settings on your openwrt devices, including installing VLANs 10,30,40,70 and configuring radios on them
-
-# What can we automate, what point-and-drool do we still need to do?
-For physical boxes, VMs and LXC containers, I take the vanilla debian
+* Override some of Debian's more annoying defaults per my opinionated opinion
+* Manage Tasmota settings on your IOT devices
+* Manage OpenWRT settings on your OpenWRT devices, including installing VLANs 10,30,40,70 and configuring radios on them
+
+## What aren't we?
+
+This playbook does not deploy Proxmox from scratch.  It also doesn't
+require Proxmox - but I find it very helpful.  It can convert an
+existing Proxmox cluster into being managed like any of our other
+Debian SOE's (as far as our SOE interaction is concerned - user
+accounts, keys, SOE configuration, backups, parts of networking;
+however our playbooks don't do Proxmox management on the Proxmox
+cluster).  Likewise, we don't rely on any aspect of that (I do it,
+because I want my helpful tools to also be used on my Proxmox
+cluster).  It doesn't deploy new VMs/CTs via Proxmox yet - you still
+need to deploy them some other means, or manually.  We also don't
+preclude you doing management outside these playbooks.  And of course,
+not practicing what I preach, I don't have a properly thought out dev
+environment nor have any automated testing infrastructure, because I
+ain't doing anything important here.  It doesn't manage every aspect
+of machines - plenty of my machines can be redeployed from scratch and
+will come up identically because their OS settings are managed via
+these plays, and I have otherwise bind mounted their data into their
+LXC containers, but someone had to populate that data and application
+configuration in the first place.  And the Proxmox cluster, Ceph
+installation and configuration get managed the Proxmox way.  But Ceph
+secrets distribution and mounting on clients?  We can manage that here.
+
+While it can manage OpenWRT installations, it similarly isn't going to
+deploy OpenWRT from scratch (although we can now apply a sysconfig
+update and keep your firmware versions encoded into our config!), and
+you'll want to set up initial networking, switching, wifi radio, etc
+at initial installation time, but we come along and install our
+packages and configs then tweak those UCI settings per our config.
+
+Similarly story for Tasmota - very few of my devices have had their
+GPIOs programmed through these playbooks, for example.  But every time
+I shift an AP around, I update the relevant primary/failover SSID of
+all my Tasmota devices.  It wouldn't be much work to arrange firmware
+to be patched like in the OpenWRT playbook either (makes me more
+nervous, but is it any harder to open up a bricked 240v smart plug
+with triangle security screws vs a bricked TP-Link Archer D7 v1 held
+together with fragile plastic clips?).
+
+We are not DRY - many configs have to be specified all over the place.
+Sorry.
+
+## What can we automate, what point-and-drool do we still need to do?
+
+For physical boxes, VMs and LXC containers (both provided through
+Proxmox in my case), I rely on the machine being deployed the
+traditional manual way (haven't yet automated talking to the Proxmox
+API, even though the Proxmox boxes themselves are managed in this
+playbook like any other Debian box).  We take this vanilla Debian
 install where I've just clicked through the install ISO steps in a
-fairly braindead fashion (our playbooks goes and fixes a bunch of
-things asked in install anyway), apply bootstrap.yml first, then
-initial_server_config.yml to bring the configuration up to my SOE and
-upon any further update to my config or after the machine is patched.
-I don't automate the entire playbook being played because I still have
-far too many pets and manual changes that need checking, and the
-playbook is quite slow to run all the way through, so I'm frequently
-using `--diff`, `--check` and `--tags` to limit the scope of changes
-to what I'm currently concerned with.  It would be easy to email
-myself an output of `--check --diff initial_server_config.yml` every
-night, but I get enough email as-is.
-
-For OpenWRT APs and tasmota configurations however, we still rely on
+fairly braindead fashion (I haven't setup Debian fully-automated, nor
+preseed or anything like that; our playbooks goes and then fixes a
+bunch of things asked in install anyway), apply bootstrap.yml first,
+then initial_server_config.yml to bring the configuration up to my SOE
+and upon any further update to my config or after the machine is
+patched.  I don't automate the entire playbook being regularly played
+or setup automated drift-detection, because I still have far too many
+pets and manual changes that need checking, and the playbook is quite
+slow to run all the way through, so I'm frequently using `--diff`,
+`--check` and `--tags` to limit the scope of changes to what I'm
+currently concerned with.  It would be easy to email myself an output
+of `--check --diff initial_server_config.yml` every night, but I get
+enough email as-is.
+
+For OpenWRT APs and Tasmota configurations however, we still rely on
 some manual configuration for a new device from scratch before
 applying these playbooks - but it does make it easier to wholesale
 change all of your SSIDs for example.  For upgrades however, we can
 automate all of the tasks needed to bring your device back into
 compliance with your config without any manual configuration.  I apply
-it routinely after any minor upgrade of the openwrt device, perhaps
+it routinely after any minor upgrade of the OpenWRT device, perhaps
 checking with `--diff` `--check` first if I'm a little nervous.  I do
 test on my VMs first (sometimes I'm even sensible enough to test on
 one of my virtualised APs before I test it on my internet gateway),
 and take a snapshot manually beforehand.  For major upgrades, it
 hasn't broken majorly for me yet, but my planning tends to be a little
-more careful around these events.  For setting up a new openwrt AP, my
+more careful around these events.  For setting up a new OpenWRT AP, my
 manual configuration tends to be limited to setting switch vlan
 tagging information in /etc/config/network, and assigning radio0 and
 radio1 consistently with my other devices, before letting
 uci_config.yml loose on VLAN and network definitions and what SSIDs
 we've assigned to which radios.  This sometimes involves swapping
 radio0 and radio1 pci/hardware devices in /etc/config/wireless.
 
-## Requirements
+# Requirements
 
-* [Ansible](http://docs.ansible.com/ansible/intro_installation.html) installed locally on your machine
-* You probably want to install `ansible-mitogen` (and `python3-mitogen`) on your ansible server too, for my ansible.cfg sets `strategy = mitogen_linear` to greatly accelerate the playbook (it works with that setting disabled if you can't install migoten, but mitogen has never created any detectable problems for me).  I have only tested this from a debian machine (debian 11,12).
+* [Ansible](http://docs.ansible.com/ansible/intro_installation.html) installed locally on your machine to run the playbooks.  Tested on Debian's Ansible from Debian 10 (Buster) to Debian 12 (Bookworm)
+* You probably want to install `ansible-mitogen` (and `python3-mitogen`) on your Ansible server too, for my ansible.cfg sets `strategy = mitogen_linear` to greatly accelerate the playbook (it works with that setting disabled if you can't install migoten, but mitogen has never created any detectable problems for me).  I have only tested this from a Debian machine (Debian 11,12).
 * Ideally, you'd create a gpg encrypted file in misc/vault-password.gpg, and verify it can be read with: misc/get-vault-pass.sh
-* Openwrt plays rely on [ansible-openwrt](https://github.com/gekmihesg/ansible-openwrt), which is published as a [galaxy collection](https://galaxy.ansible.com/ui/repo/published/nn708/openwrt/).
-* Tasmota plays rely on [ansible-tasmota](https://github.com/tobias-richter/ansible-tasmota), which is available through [ansible galaxy](https://galaxy.ansible.com/ui/standalone/roles/tobias_richter/tasmota/), but which [I've modified](https://github.com/spacelama/ansible-tasmota) to allow for and transparently recovers from the tasmota device spontaneously rebooting after certain configurations are applied.
+* OpenWRT plays rely on [ansible-openwrt](https://github.com/gekmihesg/ansible-openwrt), which is published as a [galaxy collection](https://galaxy.ansible.com/ui/repo/published/nn708/openwrt/).
+* Tasmota plays rely on [ansible-tasmota](https://github.com/tobias-richter/ansible-tasmota), which is available through [ansible galaxy](https://galaxy.ansible.com/ui/standalone/roles/tobias_richter/tasmota/), but which [I've modified](https://github.com/spacelama/ansible-tasmota) to allow for and transparently recovers from the Tasmota device spontaneously rebooting after certain configurations are applied.
 
-## Configuration
+# Initial playbook setup and Configuration
 
 Clone the repo
 
 ```
 $ git clone --recurse-submodules https://github.com/spacelama/ansible-initial-server-setup.git
 ```
 
-# Debian Linux LXC and VMs, desktops, servers, Proxmox etc
+## Configure Debian Linux LXC and VMs, desktops, servers, Proxmox etc
+
+Look at all the .gitignore files scattered around for an indication as
+to what secrets you may need.  Populate those files with secrets
+properly encoded with the vault.
 
 Modify the variables in **_vars/main.yml_** according to your needs:
 
@@ -85,7 +152,7 @@ Modify the variables in **_vars/main.yml_** according to your needs:
 
 **ssh_port:** your chosen SSH port
 
-Modify **_hosts.yml_** with your various host settings
+Modify **_hosts.yml_** with your various host settings (and public keys etc).
 
 I put a bunch of vault encoded per-host secrets in
 `host_vars/<hostname>.yml` too, encrypted via `ansible-vault
@@ -96,9 +163,9 @@ further protect their contents (likewise for `files/main.*.password`
 and `files/ap.*` which should have been vault entries in the first
 place).  Host settings I want to track in git are in `hosts.yml`.
 
-## Bootstrapping
+## Bootstrapping your new LXC/VM/Physical node to be Ansible managed
 
-Install the ansible_adm account and the sudo permissions for this account to escalate to root with:
+Install the Ansible_adm account and the sudo permissions for this account to escalate to root with:
 
 `$ ansible-playbook bootstrap.yml -u root -k --extra-vars "target=dirac-new" --ask-vault-pass # always run with --check when first starting out!`
 
@@ -116,33 +183,33 @@ Fix up an lxc container:
 
 ## Testing
 
-It's not foolproof, but try `--check` prior to each real ansible.
+It's not foolproof, but try `--check` prior to each real ansible invocation.
 
 `--diff` is extremely handy, but not foolproof when also running --check.
 
 I frequently `--limit` to hosts or away from hosts.
 
 `$ ansible-playbook --ask-vault-pass initial_server_setup.yml --diff --check --limit='!dirac-new,!fs-new,!hass-debian,!mail'`
 
---limit also useful when you get a new openwrt AP or tasmota device:
+--limit also useful when you get a new OpenWRT AP or Tasmota device:
 
 `$ ansible-playbook -v openwrt_maintenance.yml --diff --check`
 
 `$ ansible-playbook tasmota_maintenance.yml --diff --check --limit patiofluro-power,loungefrontlight-power --extra-vars "setpsk=true" --extra-vars "setsyslog=true"`
 
-## Production
+# Production
 
 Then run the playbooks:
 
 `$ ansible-playbook --ask-vault-pass initial_server_setup.yml --diff --limit='!dirac-new,!fs-new,!hass-debian,!mail'`
 
-Likewise for tasmota and openwrt:
+Likewise for Tasmota and OpenWRT:
 
 `$ ansible-playbook -v openwrt_maintenance.yml --diff`
 
 `$ ansible-playbook tasmota_maintenance.yml --diff --limit patiofluro-power,loungefrontlight-power --extra-vars "setpsk=true" --extra-vars "setsyslog=true"`
 
-## Using tags to limit the scope of changes
+# Using tags to limit the scope of changes
 
 initial_server_setup.yml uses tags on each role.
 
@@ -154,19 +221,19 @@ or to stop the webserver role from running while still running everything else:
 
 `$ ansible-playbook --diff initial_server_setup.yml --limit met,webserver,iot --skip-tags webserver --check`
 
-## Ad hoc commands:
+# Ad hoc commands:
 
 Using [ansbak](https://github.com/spacelama/ansbak), you can get an
 analogous result to using `pdsh ... | dshbak -c`, using your same
-ansible inventory as usual (such that this has become my main way of
+Ansible inventory as usual (such that this has become my main way of
 interacting with my fleet instead of pdsh or cssh):
 
 `$ ANSIBLE_FORCE_COLOR=true ansible laptops,desktops,servers,containers -i ~/Ansible/hosts.yml -m shell -a "sudo env bash -c 'dpkg --get-selections | grep munin'" | ansbak.py`
 
 
-# Openwrt routers, wireless APs, VLANs
+# OpenWRT routers, wireless APs, VLANs
 
-`vars/openwrt.yml` contains some settings for all your openwrt devices
+`vars/openwrt.yml` contains some settings for all your OpenWRT devices
 (routers, APs etc), and sets up a bunch of VLANs for your IOT devices,
 windows devices etc, assigned per MAC address (VLAN decided by which
 SSID your device joins - my IOT devices from China only know about my
@@ -187,9 +254,9 @@ firewalls, banip etc), but I've been using this to configure fresh APs
 from scratch.  Have a good backup of your APs before you run this for
 the first time though if you've already set them up in any way.  The
 radio stuff is expected to be quite fragile, and has only received
-most testing on current openwrt 22.03. and 23.05.*
+most testing on current OpenWRT 22.03.*, 23.05.*, 24.10.*.
 
-My inventory is in hosts.yml, and tells us whether the openwrt device
+My inventory is in hosts.yml, and tells us whether the OpenWRT device
 uses DSA switch config or the old definition, via
 `openwrt_dsa_switch_config`.  IP address are decided by
 `inet_addr_suffix` in your inventory to assign
@@ -201,7 +268,7 @@ offload the biggest of non-essential packages from even my smallest
 wavlink wn575a3 with 8MB of flash).  `type` is 'ap' or 'router' and
 decided which packages to install and how to set up DHCP.
 
-Run the playbook to configure all openwrt devices configured in
+Run the playbook to configure all OpenWRT devices configured in
 hosts.yml:
 
 `$ ansible-playbook openwrt_maintenance.yml --diff # --check to verify changes first`
@@ -218,7 +285,8 @@ the package installation and file sync with:
 `$ ansible-playbook openwrt_maintenance.yml --diff --extra-vars "run_uci_config=yes" --extra-vars "run_packages_config=no" --tags openwrt_config`
 
 We can also issue firmware patches, and then run the configuration
-update step again (which assumes run_uci_config=yes):
+update step again (which assumes run_uci_config=yes, which of course
+is more fragile over major version updates):
 
 `$ ansible-playbook openwrt_upgrade.yml --diff`
 
@@ -231,7 +299,7 @@ them, so long as they have firmware_image set.  If unset for a
 particular device, that device never gets patched by this playbook.
 
 Keep in mind this playbook is much more likely to fail than the
-openwrt_maintenance one, because who knows what assumptions we make or
+OpenWRT_maintenance one, because who knows what assumptions we make or
 your configuration makes that will break in each update (be careful of
 migrations to DSA switches, as always), so keep firm backups (or be
 prepared to run the playbook multiple times until you get it right).
@@ -262,10 +330,10 @@ This will set SSID1 and you might choose fallback SSID2 per host in
 your hosts.yml inventory - here, we set the first one to be your
 primary SSID that you mesh or roam between throughout your host, and
 SSID2 might be the *second* closest AP to where your device normally
-sits.  That way, tasmota will lock onto (and roam via 80211.r) your
+sits.  That way, Tasmota will lock onto (and roam via 80211.r) your
 closest AP on the primary SSID by default, but if that AP continues to
 serve valid wifi connections but loses connectivity to the network
-itself, tasmota's watchdog will notice this loss of packet
+itself, Tasmota's watchdog will notice this loss of packet
 connectivity, and will failover to the second closest AP that is
 hopefully still on a working network.
 
@@ -277,12 +345,12 @@ I set `disable_default_reset_on_power_reset7=1` on devices that
 frequently lose their power, so they don't accidentally get firmware
 reset.
 
-The ansible tasmota provider thinks PSK and syslog change every time
+The Ansible Tasmota provider thinks PSK and syslog change every time
 you try to adjust them, even if actually unchanged, so by default, I
 don't set them.  They only get attempted to be set when you supply
 setpsk=true and setsyslog=true.  But since writing that, I've set a
 lot more parameters, some of which unconditionally overwrite the
-setting even when unchanged, and some of these result in the tasmota
+setting even when unchanged, and some of these result in the Tasmota
 device rebooting every time.  I've had to put a [workaround in
 ansible-tasmota](https://github.com/spacelama/ansible-tasmota) to
 allow the device to recover and continue setting subsequent