Skip to content

Commit bfd814e

Browse files
authored
Merge pull request networkupstools#3217 from jimklimov/issue-3055
Follow up on gitlog2version, and update some FAQ docs on race condition and late shutdown
2 parents 756318d + e113526 commit bfd814e

File tree

8 files changed

+643
-48
lines changed

8 files changed

+643
-48
lines changed

NEWS.adoc

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -360,15 +360,19 @@ several `FSD` notifications into one executed action. [PR #3097]
360360
syntax is not supported everywhere (or the `!` operator generally).
361361
[#3099, #1660]
362362

363-
- Revised `tools/gitlog2version.sh` helper script with a mode to expand
364-
NUT SEMVER components into wide zero-padded numbers, so it is easier
365-
to alphanumerically compare different releases regardless of version
366-
component lengths in digits (and differences in their amounts).
363+
- Revised `tools/gitlog2version.sh` helper script (logic added into
364+
a new `tools/semver-compare.sh`) with a mode to expand NUT SEMVER
365+
components into wide zero-padded numbers, so it is easier to
366+
alphanumerically compare different releases regardless of version
367+
component lengths in digits (and differences in their amounts). You can
368+
request an inverse operation with `NUT_VERSION_STRIP_LEADING_ZEROES=true`.
367369
Added tests to cover different shell interpreter platforms (piggy-back
368370
on the `tests/nut-driver-enumerator-test.sh` script), and made sure
369371
that outputs of legacy-mode processing (with `NUT_VERSION_DEFAULT`
370372
string provided by caller or saved in a tarball) are consistent with
371-
git-mode. [issue #3055, PR #3213]
373+
git-mode. The new `tools/semver-compare.sh` helper can be used directly
374+
to expand and strip version strings, sort and compare multiple versions
375+
as one simple operation. [issue #3055, PRs #3213, #3217]
372376

373377
- The NUT Integration Testing suite (NIT) script, if started as `root`,
374378
can now consult its run-time situation vs. `BUILTIN_RUN_AS_USER` and

docs/FAQ.txt

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,9 +359,104 @@ sources for a more streamlined and modern variant):
359359
halt -p
360360
------
361361

362+
For more details and a better integrated example, please see
363+
https://github.com/networkupstools/nut/blob/master/scripts/systemd/nutshutdown.in
364+
(a copy tailored for your OS distribution may be or not be included
365+
in NUT packages, if you installed from those).
366+
362367
The other solution is to change your BIOS setting to "always power
363368
on" instead of "last state", assuming that's possible.
364369

370+
== My operating system has no shutdown scripts, how can I tell the UPS to go down and/or avoid power races?
371+
372+
Modern operating systems offer service management frameworks, like
373+
the Solaris 10/11 and illumos SMF, or Linux systemd. These frameworks
374+
drive the life cycle of the operating system and enforce their opinions
375+
on the shutdown end-game in particular. For example, any "user-land"
376+
processes which remain alive after a certain (configurable or hard-coded)
377+
timeout would be killed off to not block the shutdown or reboot request.
378+
379+
While systemd has a concept of shutdown hooks, so the logic above can be
380+
placed as a script into a special directory `/usr/lib/systemd/system-shutdown`
381+
and called after all the service daemons have been killed off, the SMF does
382+
not commonly offer such a facility.
383+
384+
Instead, the question is turned around: "What would break in your system
385+
if it is suddenly turned off (not only due to power, but also kernel panic,
386+
hardware failure, etc.?)" and the suggested solutions follow up from there.
387+
388+
The copy-on-write filesystems, like ZFS extensively used in Solaris/illumos as
389+
well as other operating systems (including some *BSD and Linux distributions),
390+
do not care about sudden reboots (assuming the storage hardware does not lie
391+
about honouring orderly metadata flushes). At worst, an incomplete transaction
392+
would be lost, but the filesystem structure itself remains valid and does not
393+
need any lengthy `fsck` after a reboot.
394+
395+
It may be or not be similar with databases (depending on how yours would log
396+
incoming write transactions), mail systems, etc.
397+
398+
Chances are, with a complete stack of well-engineered software for which
399+
hardware failure is always an option (considered in design), you do not
400+
even need to shut down anything in case of reported power failure and the
401+
UPS going on battery.
402+
403+
In practice, if you are not after maximizing the service uptime but rather want
404+
some peace of mind that your application data is not corrupted, you can script
405+
your `NOTIFYCMD` implementation to stop just those services (containers, VMs...),
406+
maybe `umount` non-transactional file systems (FAT EFI partition, USB storage),
407+
and revive them when/if the UPS becomes "on-line" again and this box is still
408+
alive. Or it just boots up if the power did get lost (whether you requested
409+
the UPS to power-off or power-cycle or not, whether it honoured such request
410+
or not). Probably `upssched` as the notification handler can help implement
411+
this logic consistently, so it would react to different events. Using service
412+
grouping or milestones as a way to consistently start/stop "fragile" services
413+
should also help (in this case the framework ensures the action only happens
414+
once, regardless of how many times your scripts requested it).
415+
416+
Notably, with this approach you DO NOT tell the OS to actually shut down,
417+
and so suffer any forced kill of user-land processes after any timeouts.
418+
You choose which services might be hurt by the outage and should go down,
419+
and which can run as long as they can (or in which mode, e.g. turning your
420+
mail, DNS or LDAP server read-only just in case until the storm passes).
421+
The still-running stub of your OS is the environment which talks to the UPS
422+
and takes care of eventually restarting the services or the whole box, as
423+
you deem fit.
424+
425+
426+
== My operating system uses (immutable) images, how can I tell the UPS to go down and/or avoid power races?
427+
428+
One approach was proposed in discussion with Lennart Poettering of systemd
429+
fame, that operating system concepts constructed with (immutable) run-time
430+
images are actually layered like onions, with code in another file system
431+
(usually an initrd) picking the run-time image to use and `chroot`'ing into
432+
it... and possibly regaining control when the run-time operating environment
433+
exits. Sort of like containers, where the operating system which you interact
434+
with is the inside of the container, and you do not directly see its hypervisor.
435+
436+
The problem is that while NUT services can run as part of the run-time image
437+
and decide to power-off or power-cycle the UPS in case of trouble, there is
438+
no space for classic NUT approach to do this in the end-game of that run-time
439+
image (calling `upsmon -K` to detect the `POWERDOWNFLAG` file, and launching
440+
some new instance of the driver to talk to the UPS itself) "after the filesystem
441+
was remounted read-only", because that life-cycle concept has no such spot in
442+
the state machine of the run-time image -- its storage is to be completely
443+
unmounted.
444+
445+
Instead, the proposed idea was to have a build of NUT included also into that
446+
"managing" initrd image, with perhaps a `/run` (or similar) location passed
447+
(bind-mounted?) from the initial environment into the launched run-time system.
448+
Into such a location both the `POWERDOWNFLAG` file can be written and, as part
449+
of `NOTIFYCMD` handling, the most-recent copies of NUT configuration files
450+
which may be managed in the interactively accessible operating environment.
451+
452+
When the run-time image exits and the logic in initrd image which launched it
453+
regains control, it can check for existence of the `POWERDOWNFLAG` file and
454+
call *its own* copies of NUT drivers to talk to the UPS, and/or implement the
455+
long sleep and reboot to avoid a power race condition, if needed.
456+
457+
See also: https://github.com/networkupstools/nut/issues/2836
458+
459+
365460
== My system has an ATX power supply. It will power off just fine, but it doesn't turn back on. What can I do to fix this?
366461

367462
This depends on how clueful your motherboard manufacturer is, and

docs/nut-versioning.adoc

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -317,9 +317,10 @@ DESC='v2.8.2-2381+g1faa9945d'
317317
(minimum 6) digits each, allowing for easy scripted
318318
alphanumeric comparisons of different NUT releases
319319
regardless of digit count in original components.
320-
| * VER5X: `000002.000008.000004.000910.000002`
321-
* DESC5X with also `NUT_VERSION_EXTRA_WIDTH=9`:
322-
`000000002.000000008.000000004.000000910.000000002-912+g081755da0`
320+
| * VER5X with also `NUT_VERSION_EXTRA_WIDTH=9`:
321+
`000000002.000000008.000000004.000000910.000000002`
322+
* DESC5X with default width (6):
323+
`000002.000008.000004.000910.000002-912+g081755da0`
323324
|`SEMVER` | Exactly three leading numeric components | * `2.8.2`
324325
|`IS_RELEASE` | `true` if `SEMVER`==`VER50`, `false` otherwise
325326
| * dev: `false`
@@ -379,6 +380,40 @@ DESC='v2.8.2-2381+g1faa9945d'
379380
|default | Report `DESC50` | * `v2.8.2-2381-g1faa9945d`
380381
|=========================================================================
381382

383+
[NOTE]
384+
======
385+
To facilitate version comparisons of different NUT iterations in scripts,
386+
the "expanded variants" of the two versions can be requested. This pads each
387+
semver component (but not other numbers) with zeroes to a specified width (by
388+
default 6 digits):
389+
390+
----
391+
:; NUT_VERSION_QUERY=VER5X \
392+
NUT_VERSION_FORCED=3.14.159.2653.59-2712+gdeadbeef \
393+
gitlog2version.sh 2>/dev/null
394+
000003.000014.000159.002653.000059
395+
----
396+
397+
If you map this as an e.g. tab-separated list of "expanded" and original
398+
version strings, you can use the shell `sort` command to find the oldest
399+
and newest builds, construct an ordered list for downloads, etc.
400+
401+
Keep in mind that you should not use those zero-padded component numbers
402+
in shell math operations, as they would be considered octal numbers (and
403+
possibly invalid ones, if they contain digits `8` or `9`). You can revert
404+
the operation with envvar toggle `NUT_VERSION_STRIP_LEADING_ZEROES=true`
405+
(note it only impacts the leading dot-separated numbers semver part of the
406+
string, but not any numbers in the suffix):
407+
408+
----
409+
:; NUT_VERSION_QUERY=DESC5 \
410+
NUT_VERSION_FORCED=3.014.00159.02653.05-002658+gdeadbeef+v01.002.0003 \
411+
NUT_VERSION_STRIP_LEADING_ZEROES=true \
412+
gitlog2version.sh 2>/dev/null
413+
3.14.159.2653.5-002658+gdeadbeef+v01.002.0003
414+
----
415+
=====
416+
382417
Variables propagated by configure.ac
383418
------------------------------------
384419

docs/nut.dict

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
personal_ws-1.1 en 3595 utf-8
1+
personal_ws-1.1 en 3601 utf-8
22
AAC
33
AAS
44
ABI
@@ -670,6 +670,7 @@ Lansafecable
670670
Laventhol
671671
Lce
672672
Legrand
673+
Lennart
673674
Lepple
674675
Levente
675676
LibGD
@@ -984,6 +985,7 @@ PiJuice
984985
PiNUT
985986
Plesser
986987
PnP
988+
Poettering
987989
Pohle
988990
PointBre
989991
Pos
@@ -2120,6 +2122,7 @@ freetype
21202122
frob
21212123
frontends
21222124
fs
2125+
fsck
21232126
fsd
21242127
fsdmode
21252128
fsr
@@ -2282,6 +2285,7 @@ inet
22822285
influenceable
22832286
infos
22842287
infoval
2288+
ing
22852289
inh
22862290
init
22872291
init's
@@ -3306,6 +3310,7 @@ toolset
33063310
topFrame
33073311
topbot
33083312
tport
3313+
transactional
33093314
tripplite
33103315
tripplitesu
33113316
troff
@@ -3357,6 +3362,7 @@ ukUNV
33573362
ul
33583363
ulimit
33593364
ulink
3365+
umount
33603366
un
33613367
uname
33623368
uncomment

0 commit comments

Comments
 (0)