Fix race condition between AppImage daemon and KDE PowerDevil#355
Fix race condition between AppImage daemon and KDE PowerDevil#355LightWayUp wants to merge 5 commits intoprobonopd:masterfrom
Conversation
Update systemd unit file for appimaged service
- Add `Slice=background.slice` to allow systemd to deprioritize appimaged
background service, as AppImage daemon does not provide critical or
time-sensitive services
- New line at the end of appimaged service unit file
- Fix race condition between AppImage daemon and GVfs/KDE Solid UDisks2 volume
monitor
----------------------------------------
AppImage daemon wants to ensure that signals from
`org.gtk.vfs.UDisks2VolumeMonitor` and `org.kde.Solid.Device` would be sent, so
it checks for relevant names on the bus.
For the names to definitely exist on the bus when AppImage daemon checks, it
needs to start later than the services acquiring the names.
Both `gvfs-udisks2-volume-monitor.service` and `plasma-powerdevil.service`
are "static" services (No `[Install]` section, only started if other units
depend on them).
Taking PowerDevil as example, it is depended on by `plasma-workspace.target`.
`plasma-workspace-{x11/wayland}.target` are in turn also static, and only
requested to be started by `startplasma-{x11/wayland}` binaries, which is
executed by SDDM (the display manager) when the user chooses a session and logs
into it.
-----
Previously, appimaged service unit uses `WantedBy=default.target`.
**`default.target` of user units is started when the user service manager is
started.**
(Not to be confused with `default.target` of system units, which is started on
boot, and is often aliased to `multi-user.target` or `graphical.target` system
units.)
The user service manager is normally started at user login. If "user lingering"
is enabled, the user service manager is started as soon as possible after boot.
-----
From the above, we can understand that:
1. Without user lingering:
- System boots, system service manager is started.
- At the login screen, the user chooses a session and enters the password to
log in.
- The user service manager is started, and starts `default.target` and its
dependencies - including `appimaged.service`.
- The binary set as the value of the session's `Exec=` key (for example
`startplasma-wayland`) is executed, and is responsible for setting up and
starting the rest of the session. It requests the proper user service to be
started (`plasma-workspace-wayland.target`).
- The user service manager starts that session-specific service
(`plasma-workspace-wayland.target`) and its dependencies - including
`plasma-powerdevil.service`.
At this point, both AppImage daemon and PowerDevil are starting up.
1.a. If AppImage daemon runs `monitorUdisks()` **before** PowerDevil acquires
the name on the bus, AppImage daemon assumes that volumes can not be monitored,
and sends a notification "Cannot see volumes come and go".
1.b. If AppImage daemon runs `monitorUdisks()` **after** PowerDevil acquires
the name on the bus, everything is fine.
2. With user lingering:
- System boots, system service manager is started.
- The user service manager is started, and starts `default.target` and its
dependencies - including `appimaged.service`.
- AppImage daemon starts up, runs `monitorUdisks()` before PowerDevil is even
started, assumes that volumes can not be monitored, and sends a notification
"Cannot see volumes come and go".
- At the login screen, the user chooses a session and enters the password to
log in.
- The binary set as the value of the session's `Exec=` key (for example
`startplasma-wayland`) is executed, and is responsible for setting up and
starting the rest of the session. It requests the proper user service to be
started (`plasma-workspace-wayland.target`).
- The user service manager starts that session-specific service
(`plasma-workspace-wayland.target`) and its dependencies - including
`plasma-powerdevil.service`.
-----
Without user lingering, a race condition exists between appimaged service and
volume monitoring service.
With user lingering, appimaged service is always started before volume
monitoring service.
To solve the problem, `gvfs-udisks2-volume-monitor.service` and
`plasma-powerdevil.service` are added to `WantedBy=` *in addition to*
`default.target`.
Instead of starting AppImage daemon and checking/waiting for the availability
of volume monitoring services, now AppImage daemon is started *because of*
those services.
As well, they're added to `After=` so that appimaged service is definitely
started after them if they succeed.
`ExecCondition=` is added to skip appimaged service being started, when volume
monitoring services aren't ready yet, but are reasonably predicted to have
started at a near future point in time.
The new steps are:
1. System boots, system service manager is started.
2. Either now or after login, the user service manager is started, and starts
`default.target` and its dependencies - including `appimaged.service`.
3.a. If at least one of `gvfs-udisks2-volume-monitor.service` and
`plasma-powerdevil.service` exists and is enabled, but none are active, skip
starting `appimaged.service` now.
3.b. Else, AppImage daemon starts up, runs `monitorUdisks()`, and either
3.b.(a) enabled && active: succeeds, or
3.b.(b) !enabled: rightfully determines that volumes can not be monitored, at
least in the current implementation of AppImage daemon, and sends the
notification "Cannot see volumes come and go".
4. At the login screen, the user chooses a session and enters the password to
log in.
5. The binary set as the value of the session's `Exec=` key (for example
`startplasma-wayland`) is executed, and is responsible for setting up and
starting the rest of the session. It requests the proper user service to be
started (`plasma-workspace-wayland.target`).
6. The user service manager starts that session-specific service
(`plasma-workspace-wayland.target`) and its dependencies - including
`plasma-powerdevil.service`.
7. The volume monitoring service (`plasma-powerdevil.service`) is set to want
and start before `appimaged.service`, and either starts successfully or fails.
8. AppImage daemon starts up, runs `monitorUdisks()`, and either succeeds, or
rightfully determines that volumes can not be monitored, at least in the
current implementation of AppImage daemon, and sends the notification "Cannot
see volumes come and go".
Use `LAUNCHED_BY_SYSTEMD` environment variable value as a quick hack to check the revision of systemd service unit file, the environment variable value isn't used anywhere else anyway Ideally, existing service unit file content should be compared against the content we would have written, to determine whether to update the file or not. However, its probably better to refactor the rest of the source code together. The user would still be able to alter service unit properties and behaviors via "drop in" configuration, so there should be no reason to modify the service unit file other than an actual update from AppImage daemon itself.
|
Build for testing: |
|
Thanks for the detailed analysis @LightWayUp. Would be good to hear from users on different systems how well this works. |
Unconditionally re-enable service after install, to be sure changes in `[Install]` section are applied
|
Build for testing: |
See probonopd#355 for details Found the root cause of unexpected race condition with PowerDevil service, removed unnecessary `WantedBy=` dependency and `After=` ordering on `gvfs-udisks2-volume-monitor.service` as its D-Bus activated. Code related to UDisks should be reworked, as PowerDevil is unrelated to volume monitoring. However for now this is probably the best solution to the erroneous detection and "Cannot see volumes come and go" notification issue.
|
Build for testing: |
After more investigation, I've found the root cause of the issue, that leads to the discrepancy in behavior between sending D-Bus calls to KDE Plasma's
The solution didn't change much though, most important part was adding a reverse dependency on I have tested the fix across various distributions and desktop environments, and can verify that it works correctly in all except one case, as described in the Edge case section above. It requires a very specific and uncommon setup to be hit, so in my opinion fixing issue #48 outweighs the potential disadvantage. |
This fixes issue 48 "Cannot see volumes come and go on Kubuntu 20.04".
Earlier attempt to fix the issue (#233) implemented a retry-after-timeout, with the idea that the required services are simply "occasionally too slow to come up". With the 0.5s timeout, indeed most of the time the services are ready when AppImage daemon checks for their availability.
However, that doesn't actually fix the issue — a race condition present somewhere, it just masks it.
This change addresses the race condition itself via systemd dependency and ordering directives, no arbitrary sleeps are involved.
Changes
appimaged.servicesystemd unit fileSlice=background.sliceto allow systemd to deprioritize appimaged background service, as AppImage daemon does not provide critical or time-sensitive servicessystemd service setup prerequisite
Use
LAUNCHED_BY_SYSTEMDenvironment variable value as a quick hack to check the revision of the service unit file. This is fine because the environment variable's value isn't being used elsewhere anyway.Tip
Ideally to determine whether to update the existing service unit file, its content should be compared against the content embedded in the executing version of AppImage daemon.
The user would still be able to alter service unit properties and behaviors to their needs via "drop in" configuration file (
appimaged.service.d/local.conf), so there should be no reason to modify the service unit file other than an actual update from AppImage daemon itself.Details
AppImage daemon wants to ensure that signals from
org.gtk.vfs.UDisks2VolumeMonitorandorg.kde.Solid.Deviceinterfaces would be sent, so it checks for relevant names on the bus.The check isn't necessary for AppImage daemon to monitor the messages on D-Bus, it just would like to send a notification informing the user, in the case that such messages are likely never going to be sent (due to the services being absent).
Backgrounds
Volume monitoring
GNOME desktop and derivatives (MATE, Cinnamon) and Xfce use GVfs UDisks 2 volume monitor, a daemon process.
org.gtk.vfs.UDisks2VolumeMonitorgvfs-udisks2-volume-monitor.serviceKDE Plasma desktop uses Solid device integration framework, with UDisks 2 backend, a library/framework to be used by applications.
Because its not an application providing a service, there's nothing to check against on D-Bus. It was assumed that if the PowerDevil service, which also uses Solid, is available on the bus, there would likely be other running KDE components using Solid (KIO, Dolphin, ...), and AppImage daemon would therefore receive signals from interface
org.kde.Solid.Devicejust fine.org.kde.Solid.PowerManagementplasma-powerdevil.servicesystemd and D-Bus services
Both
gvfs-udisks2-volume-monitor.serviceandplasma-powerdevil.serviceare "static" systemd services of typedbus.[Install]section, they're only started if other units depend on them.Type=dbus, such services are "D-Bus activated".D-Bus activation
If a D-Bus activated service isn't running yet, the D-Bus call to it will block,
dbus-daemonwill start the service, then send the message to it, and finally deliver the reply back to the caller.This works fine with GVfs UDisks 2 volume monitor. AppImage daemon sends a D-Bus method call and blocks, GVfs UDisks 2 volume monitor is activated by
dbus-daemon, a reply is sent back, then AppImage daemon continues.Unfortunately with PowerDevil, there are 2 issues, both out of our control.
Type=dbusin the systemd service unit file, PowerDevil is not D-Bus activatable, because there is no corresponding D-Bus service file/usr/share/dbus-1/services/org.kde.Solid.PowerManagement.service.Out of all KDE services, it just happens that PowerDevil and PlasmaShell seem to be the only 2 with this issue.
This can be verified by looking at appimaged service's journal logs, in cases where it starts before PowerDevil:
This leads to unexpected behavior. When AppImage daemon sends a D-Bus method call, PowerDevil does not start, instead the call fails with the reason that
The name org.kde.Solid.PowerManagement was not provided by any .service files.QGuiApplication(Qt application with QML GUI) rather than a QCoreApplication (Qt console application), meaning that starting it requires a X or Wayland display server, which is only started in a graphical user login session.This is normally fine when appimaged service starts at user login, but problematic if "user lingering" is enabled as explained later.
systemd static services
PowerDevil is depended on by
plasma-workspace.target.plasma-workspace-{x11/wayland}.targetare in turn also static, and only requested to be started bystartplasma-{x11/wayland}executable binaries, which is executed by SDDM (the display manager responsible for login screen) when the user chooses a session and logs into it.Combined with being a GUI application, this means
plasma-powerdevil.servicecan only ever be started after login.systemd "user lingering"
systemd user services are managed by systemd user service manager.
Usually this means, the user services are started after login, and stopped before logout/shutdown.
The user services are started at boot, and stopped at shutdown.
This can be useful for long-running user services, for example rootless Docker.
Issue of current
appimaged.serviceunit filePreviously, appimaged service unit uses
WantedBy=default.target, as in,default.targetwants/depends onappimaged.serviceand will cause it to start.default.targetof user units is started when the user service manager is started.Note
Not to be confused with
default.targetof system units, which is started on boot, and is often aliased tomulti-user.targetorgraphical.targetsystem units.From the above, we can understand that the following sequence of events are possible:
1. Without user lingering:
A race condition exists between appimaged service and PowerDevil.
1. i. AppImage daemon runs
monitorUdisks()before PowerDevil acquires the name on the bus. AppImage daemon assumes that volumes can not be monitored, and sends a notification "Cannot see volumes come and go".1. ii. AppImage daemon runs
monitorUdisks()after PowerDevil acquires the name on the bus, everything is fine.2. With user lingering:
The issue is much more apparent. AppImage daemon always runs
monitorUdisks()before PowerDevil is even started, assumes that volumes can not be monitored, and sends a notification "Cannot see volumes come and go".Solution
What doesn't work
Create the D-Bus service file with the following content:
Place the file at
${XDG_DATA_HOME}/dbus-1/services/org.kde.Solid.PowerManagement.service(per user) or/usr/local/share/dbus-1/services/org.kde.Solid.PowerManagement.service(system wide).Now, when AppImage daemon sends a D-Bus method call to PowerDevil, it successfully blocks as expected, waits for PowerDevil to be started by
dbus-daemon, and continues after receiving a reply. No more retries and arbitrary sleeps are needed.What's the problem:
If the system doesn't actually have PowerDevil (for example a system with GNOME desktop environment), this solution just erroneously created a seemingly D-Bus activatable service that will always fail.
AppImage daemon can't reliably add or remove the service file when PowerDevil is installed or uninstalled either. The service file needs to come together with PowerDevil package.
Let appimaged service be started by
default.target, and poll for PowerDevil service status until it is active or a timeout is reached.What's the problem:
This is essentially the current implementation, just with multiple retries, and a longer timeout or an infinite wait.
Proper solution
plasma-powerdevil.serviceandgraphical-session.targetare added toWantedBy=directive in addition todefault.targetdefault.targetis retained, so appimaged service still starts on systems without PowerDevil service installed.graphical-session.targetis used as a fallback, in case PowerDevil is installed on the system, but will not be used for this session.After=directiveTip
Both GVfs UDisks 2 volume monitor and PowerDevil services are started and stopped as part of graphical sessions.
Why not simply use
WantedBy=graphical-session.targetto always start at the right time?Some distributions and environments don't comply with systemd's rules despite "supporting systemd".
Take Linux Mint Cinnamon Edition for example, the cinnamon session executable never starts the passive
graphical-session.targetunit. This can be verified in a running Cinnamon session, the status ofgraphical-session.targetshowsActive: inactive (dead).GVfs UDisks 2 volume monitor has the correct status
Active: active (running), as its already correctly configured by GNOME developers.ExecCondition=condition check directive is added to skip appimaged service being started, when PowerDevil service isn't ready yet, but is reasonably predicted to have started at a near future point in time.Specifically, it'll skip being started as a dependency of
default.targetif all of the following are true:gvfs-udisks2-volume-monitor.servicedoes not exist or is not enabledplasma-powerdevil.servicedoes exist, is not active, and is enabledgraphical-session.targetis not activeNew appimaged service unit file
ExecConditionshell scriptThe new sequence of events is illustrated as follows:
Edge case
The new appimaged service unit would solve the erroneous "Cannot see volumes come and go" notification issue, at the expense of AppImage daemon not being started in a specific edge case.
To encounter this case:
graphical-session.targetunitIf all of the above are true, appimaged service will skip being started by
default.target, but will not be started by anything else later.The fundamental reason is that, we can never predict whether PowerDevil service will be started or not until it is actually started, because it is a static service with no active
WantedBy=, and is only depended on by other parts of a session, a session that user may or may not choose to use.I believe this fix should be incorporated, as
Should one actually encounters the edge case, and would like to still have AppImage daemon service start, they can remove the
ExecCondition=directive via drop-in configuration file.