@@ -359,9 +359,104 @@ sources for a more streamlined and modern variant):
359359 halt -p
360360------
361361
362+ For more details and a better integrated example, please see
363+ https://github.com/networkupstools/nut/blob/master/scripts/systemd/nutshutdown.in
364+ (a copy tailored for your OS distribution may be or not be included
365+ in NUT packages, if you installed from those).
366+
362367The other solution is to change your BIOS setting to "always power
363368on" instead of "last state", assuming that's possible.
364369
370+ == My operating system has no shutdown scripts, how can I tell the UPS to go down and/or avoid power races?
371+
372+ Modern operating systems offer service management frameworks, like
373+ the Solaris 10/11 and illumos SMF, or Linux systemd. These frameworks
374+ drive the life cycle of the operating system and enforce their opinions
375+ on the shutdown end-game in particular. For example, any "user-land"
376+ processes which remain alive after a certain (configurable or hard-coded)
377+ timeout would be killed off to not block the shutdown or reboot request.
378+
379+ While systemd has a concept of shutdown hooks, so the logic above can be
380+ placed as a script into a special directory `/usr/lib/systemd/system-shutdown`
381+ and called after all the service daemons have been killed off, the SMF does
382+ not commonly offer such a facility.
383+
384+ Instead, the question is turned around: "What would break in your system
385+ if it is suddenly turned off (not only due to power, but also kernel panic,
386+ hardware failure, etc.?)" and the suggested solutions follow up from there.
387+
388+ The copy-on-write filesystems, like ZFS extensively used in Solaris/illumos as
389+ well as other operating systems (including some *BSD and Linux distributions),
390+ do not care about sudden reboots (assuming the storage hardware does not lie
391+ about honouring orderly metadata flushes). At worst, an incomplete transaction
392+ would be lost, but the filesystem structure itself remains valid and does not
393+ need any lengthy `fsck` after a reboot.
394+
395+ It may be or not be similar with databases (depending on how yours would log
396+ incoming write transactions), mail systems, etc.
397+
398+ Chances are, with a complete stack of well-engineered software for which
399+ hardware failure is always an option (considered in design), you do not
400+ even need to shut down anything in case of reported power failure and the
401+ UPS going on battery.
402+
403+ In practice, if you are not after maximizing the service uptime but rather want
404+ some peace of mind that your application data is not corrupted, you can script
405+ your `NOTIFYCMD` implementation to stop just those services (containers, VMs...),
406+ maybe `umount` non-transactional file systems (FAT EFI partition, USB storage),
407+ and revive them when/if the UPS becomes "on-line" again and this box is still
408+ alive. Or it just boots up if the power did get lost (whether you requested
409+ the UPS to power-off or power-cycle or not, whether it honoured such request
410+ or not). Probably `upssched` as the notification handler can help implement
411+ this logic consistently, so it would react to different events. Using service
412+ grouping or milestones as a way to consistently start/stop "fragile" services
413+ should also help (in this case the framework ensures the action only happens
414+ once, regardless of how many times your scripts requested it).
415+
416+ Notably, with this approach you DO NOT tell the OS to actually shut down,
417+ and so suffer any forced kill of user-land processes after any timeouts.
418+ You choose which services might be hurt by the outage and should go down,
419+ and which can run as long as they can (or in which mode, e.g. turning your
420+ mail, DNS or LDAP server read-only just in case until the storm passes).
421+ The still-running stub of your OS is the environment which talks to the UPS
422+ and takes care of eventually restarting the services or the whole box, as
423+ you deem fit.
424+
425+
426+ == My operating system uses (immutable) images, how can I tell the UPS to go down and/or avoid power races?
427+
428+ One approach was proposed in discussion with Lennart Poettering of systemd
429+ fame, that operating system concepts constructed with (immutable) run-time
430+ images are actually layered like onions, with code in another file system
431+ (usually an initrd) picking the run-time image to use and `chroot`'ing into
432+ it... and possibly regaining control when the run-time operating environment
433+ exits. Sort of like containers, where the operating system which you interact
434+ with is the inside of the container, and you do not directly see its hypervisor.
435+
436+ The problem is that while NUT services can run as part of the run-time image
437+ and decide to power-off or power-cycle the UPS in case of trouble, there is
438+ no space for classic NUT approach to do this in the end-game of that run-time
439+ image (calling `upsmon -K` to detect the `POWERDOWNFLAG` file, and launching
440+ some new instance of the driver to talk to the UPS itself) "after the filesystem
441+ was remounted read-only", because that life-cycle concept has no such spot in
442+ the state machine of the run-time image -- its storage is to be completely
443+ unmounted.
444+
445+ Instead, the proposed idea was to have a build of NUT included also into that
446+ "managing" initrd image, with perhaps a `/run` (or similar) location passed
447+ (bind-mounted?) from the initial environment into the launched run-time system.
448+ Into such a location both the `POWERDOWNFLAG` file can be written and, as part
449+ of `NOTIFYCMD` handling, the most-recent copies of NUT configuration files
450+ which may be managed in the interactively accessible operating environment.
451+
452+ When the run-time image exits and the logic in initrd image which launched it
453+ regains control, it can check for existence of the `POWERDOWNFLAG` file and
454+ call *its own* copies of NUT drivers to talk to the UPS, and/or implement the
455+ long sleep and reboot to avoid a power race condition, if needed.
456+
457+ See also: https://github.com/networkupstools/nut/issues/2836
458+
459+
365460== My system has an ATX power supply. It will power off just fine, but it doesn't turn back on. What can I do to fix this?
366461
367462This depends on how clueful your motherboard manufacturer is, and
0 commit comments