fix: Reliably start vmware-user-suid-wrapper#670
fix: Reliably start vmware-user-suid-wrapper#670polarathene wants to merge 1 commit intovmware:masterfrom
vmware-user-suid-wrapper#670Conversation
When the system manages XDG autostart `.desktop` files via systemd-xdg-autostart-generator, the generated service for `vmware-user.desktop` frequently times out, failing to initialize. A custom key can be included in the `.desktop` file which systemd will recognize to ignore generating a service config file. An explicit service config can then be enabled that follows the [convention guidelines for XDG DE Integration](https://systemd.io/DESKTOP_ENVIRONMENTS/#xdg-standardization-for-applications). It should be placed in `/usr/lib/systemd/user/`.
This comment was marked as outdated.
This comment was marked as outdated.
|
@polarathene, VMware has approved your signed contributor license agreement. |
|
Increasing the VM guest from 2 vCPU to 4 on my system was effective at increasing the host CPU load often to 100% and stalling responsiveness with UI interactions for brief periods. Reproducing the failure was quite reliable from this change. Related implicit startup dependency - AT-SPI?Notes (Click to expand)I browsed the Service status logs# custom unit active instead of generated one:
$ systemctl --user status vmware-user
× vmware-user.service - VMware User Agent
Loaded: loaded (/etc/xdg/autostart/vmware-user.desktop; enabled; preset: enabled)
Active: failed (Result: timeout) since Wed 2023-06-07 17:34:46 NZST; 2min 41s ago
Docs: man:systemd-xdg-autostart-generator(8)
Process: 946 ExecStart=/usr/bin/vmware-user-suid-wrapper (code=exited, status=0/SUCCESS)
Main PID: 946 (code=exited, status=0/SUCCESS)
CPU: 437ms
Jun 07 17:34:40 polarathene systemd[600]: Starting VMware User Agent...
Jun 07 17:34:42 polarathene vmtoolsd[947]: gtk_disable_setlocale() must be called before gtk_init()
Jun 07 17:34:46 polarathene systemd[600]: vmware-user.service: start operation timed out. Terminating.
Jun 07 17:34:46 polarathene systemd[600]: vmware-user.service: Failed with result 'timeout'.
Jun 07 17:34:46 polarathene systemd[600]: Failed to start VMware User Agent.
$ systemctl --user status at-spi-dbus-bus.service
● at-spi-dbus-bus.service - Accessibility services bus
Loaded: loaded (/usr/lib/systemd/user/at-spi-dbus-bus.service; static)
Active: active (running) since Wed 2023-06-07 17:34:38 NZST; 3min 13s ago
Main PID: 692 (at-spi-bus-laun)
Tasks: 10 (limit: 4624)
Memory: 3.2M
CPU: 84ms
CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/at-spi-dbus-bus.service
├─ 692 /usr/lib/at-spi-bus-launcher
├─1037 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 11 --address=unix:path=/run/user/1000/at-spi/bus_0
└─1050 /usr/lib/at-spi2-registryd --use-gnome-session
Jun 07 17:34:38 polarathene systemd[600]: Starting Accessibility services bus...
Jun 07 17:34:38 polarathene systemd[600]: Started Accessibility services bus.
Jun 07 17:34:42 polarathene at-spi-bus-launcher[1037]: dbus-daemon[1037]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=1000 pid=947 comm="/usr/bin/vmtoolsd -n vmusr --blockFd 3")
Jun 07 17:34:42 polarathene at-spi-bus-launcher[1037]: dbus-daemon[1037]: Successfully activated service 'org.a11y.atspi.Registry'
Jun 07 17:34:42 polarathene at-spi-bus-launcher[1050]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
$ systemctl --user cat at-spi-dbus-bus.service
# /usr/lib/systemd/user/at-spi-dbus-bus.service
[Unit]
Description=Accessibility services bus
PartOf=graphical-session.target
[Service]
Type=dbus
BusName=org.a11y.Bus
ExecStart=/usr/lib/at-spi-bus-launcher
Slice=session.slice
TimeoutStopSec=5The DBus call starts at the same time the GTK log line appears. We know that the related functionality like DnD and Copy/Paste are handled by This service is for the
Side-note: Based on the README for the `journalctl --boot --unit vmware-vmblock-fuse`This is from a later run that was successful, but the exchange seems the same regardless (at least initially, a failure has a Desktop XDG autostart filesNotes (Click to expand)
KDE Plasma won't have a unit generated for this (otherwise I assume VMware is using a11y functionality through this service to support it's features? If so perhaps the timing is important, but without this as the README link documents, the service will be started on-demand which for services like a Screen Reader is apparently not ideal, thus probably the same for VMware here.
This service is generated only for KDE Plasma, and we can see that it's time stamp is aligned with the More explicit dependency order on SPI reliably succeedsLogs + Config (Click to expand)Service status logs$ systemctl --user status "app-kaccess@autostart.service"
● app-kaccess@autostart.service - Accessibility
Loaded: loaded (/etc/xdg/autostart/kaccess.desktop; generated)
Active: active (running) since Wed 2023-06-07 18:34:52 NZST; 2h 4min ago
Docs: man:systemd-xdg-autostart-generator(8)
Process: 921 ExecCondition=/usr/lib/systemd/systemd-xdg-autostart-condition KDE (code=exited, status=0/SUCCESS)
Main PID: 937 (kaccess)
Tasks: 3 (limit: 4624)
Memory: 19.2M
CPU: 942ms
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/app-kaccess@autostart.service
└─937 /usr/bin/kaccess
Jun 07 18:34:52 polarathene systemd[603]: Starting Accessibility...
Jun 07 18:34:52 polarathene systemd[603]: Started Accessibility.
Jun 07 18:34:52 polarathene kaccess[937]: Xlib XKB extension major= 1 minor= 0
Jun 07 18:34:53 polarathene kaccess[937]: X server XKB extension major= 1 minor= 0
# custom unit active instead of generated one:
$ systemctl --user status vmware-user
● vmware-user.service - VMware User Agent
Loaded: loaded (/etc/xdg/autostart/vmware-user.desktop; enabled; preset: enabled)
Active: active (running) since Wed 2023-06-07 18:34:52 NZST; 2h 6min ago
Docs: man:systemd-xdg-autostart-generator(8)
Process: 941 ExecStart=/usr/bin/vmware-user-suid-wrapper (code=exited, status=0/SUCCESS)
Main PID: 941 (code=exited, status=0/SUCCESS)
Tasks: 4 (limit: 4624)
Memory: 36.5M
CPU: 2min 1.350s
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/vmware-user.service
└─951 /usr/bin/vmtoolsd -n vmusr --blockFd 3
Jun 07 18:34:52 polarathene systemd[603]: Starting VMware User Agent...
Jun 07 18:34:52 polarathene systemd[603]: Started VMware User Agent.
Jun 07 18:34:53 polarathene vmtoolsd[951]: gtk_disable_setlocale() must be called before gtk_init()
# custom unit above modified to depend on registryd being ready first (via custom spi.service)
# the spi.service then depends on the at-spi-dbus-bus.service being started before it:
$ systemctl --user status at-spi-dbus-bus.service
● at-spi-dbus-bus.service - Accessibility services bus
Loaded: loaded (/usr/lib/systemd/user/at-spi-dbus-bus.service; static)
Active: active (running) since Wed 2023-06-07 18:34:50 NZST; 2h 11min ago
Main PID: 689 (at-spi-bus-laun)
Tasks: 6 (limit: 4624)
Memory: 1.9M
CPU: 368ms
CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/at-spi-dbus-bus.service
├─689 /usr/lib/at-spi-bus-launcher
└─957 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 11 --address=unix:path=/run/u>
Jun 07 18:34:50 polarathene systemd[603]: Starting Accessibility services bus...
Jun 07 18:34:50 polarathene systemd[603]: Started Accessibility services bus.
$ systemctl --user status spi.service
● spi.service - VM SPI
Loaded: loaded (/home/polarathene/.config/systemd/user/spi.service; enabled; preset: enabled)
Active: active (running) since Wed 2023-06-07 18:34:52 NZST; 2h 11min ago
Docs: man:systemd-xdg-autostart-generator(8)
Main PID: 935 (at-spi2-registr)
Tasks: 1 (limit: 4624)
Memory: 1.2M
CPU: 1.123s
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/spi.service
└─935 /usr/lib/at-spi2-registryd --use-gnome-session
Jun 07 18:34:52 polarathene systemd[603]: Starting VM SPI...
Jun 07 18:34:52 polarathene systemd[603]: Started VM SPI.
Jun 07 18:34:52 polarathene at-spi2-registryd[935]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
Whereas the This more explicit start order for the AT-SPI Additional observationsNotes (Click to expand)However I have masked the Simply giving the VMware unit a long enough delay before it runs it's Same when the Thanks to this log: And knowing that the AT-SPI processes are started related to this ( Still... regardless of this systemd "Started" log, the functionality will work until terminated (I've only verified copy/paste), and only appears to be an issue with
|
|
Since all of the above is incredibly verbose and quite a bit speculation of the actual cause... TL;DR
Bit more info (Click to expand)My recent upstream systemd report has effectively made this PR redundant. It may not be worth the time required to identify the actual cause in systemd or However IMO, an explicit systemd service is preferable to generating one from Summary
Workarounds will not be necessary for guests with a future version of systemd. The issue with the service not being recognized as "Started" will exist, but the timeout termination will be prevented as |
|
Wrapping up investigating this issue. The prior TLDR comment remains relevant, this comment just provides a little bit more information about observations of where the actual issue might be occurring. Notes (Click to expand)I've masked the open-vm-tools/open-vm-tools/vmware-user-suid-wrapper/main.c Lines 70 to 87 in ed34acd But this doesn't seem to affect when the service is recognized as "Started", only logging an error that prevents an arg being appended to the $ systemctl --user status vmware-user
● vmware-user.service - VMware User Agent
Loaded: loaded (/etc/xdg/autostart/vmware-user.desktop; enabled; preset: enabled)
Active: active (running) since Fri 2023-06-09 18:56:27 NZST; 14min ago
Docs: man:systemd-xdg-autostart-generator(8)
Process: 690 ExecStart=/usr/bin/vmware-user-suid-wrapper (code=exited, status=0/SUCCESS)
Main PID: 690 (code=exited, status=0/SUCCESS)
Tasks: 4 (limit: 4624)
Memory: 38.8M
CPU: 1.769s
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/vmware-user.service
└─693 /usr/bin/vmtoolsd -n vmusr
Jun 09 18:56:27 polarathene systemd[569]: Starting VMware User Agent...
Jun 09 18:56:27 polarathene systemd[569]: Started VMware User Agent.
Jun 09 18:56:27 polarathene vmware-user-suid-wrapper[693]: vmware-user: could not open /proc/fs/vmblock/dev
Jun 09 18:56:27 polarathene vmtoolsd[693]: gtk_disable_setlocale() must be called before gtk_init()NOTE: The The service is recognized as "Started" before the error "could not open open-vm-tools/open-vm-tools/vmware-user-suid-wrapper/main.c Lines 249 to 269 in ed34acd open-vm-tools/open-vm-tools/vmware-user-suid-wrapper/main.c Lines 214 to 223 in ed34acd According to systemd docs for
open-vm-tools/open-vm-tools/vmware-user-suid-wrapper/main.c Lines 285 to 293 in ed34acd open-vm-tools/open-vm-tools/vmware-user-suid-wrapper/wrapper-linux.c Lines 122 to 145 in ed34acd Perhaps failure happens earlier, but open-vm-tools/open-vm-tools/vmware-user-suid-wrapper/main.c Lines 116 to 149 in ed34acd |
|
Had a look at Gnome 44.2 on the same EndeavourOS host and guest (new install for Gnome): Main curiosity for checking with gnome was due to it's better copy/paste support on Wayland. None of the findings appear relevant to that improved integration however. Notes (Click to expand)Gnome presently does not seem to use This error is easy to get if running |
|
@jonathanvmw what would you like to do with this PR?
If you'd rather take no action, I don't mind closing this if you don't see any benefit to |
Adapted from findings: #568 (comment)
Related: #669 systemd/systemd#27919
Fixes: #568 #603 #604 #627 #629 #657
UPDATE: Reviewers may want to skip to this comment as a starting point, it'll save you time 👍
UPDATE 2: Should be fixed by upstream systemd change for v254: systemd/systemd#28314
Background
@ravindravmw was involved in the quoted bug report ending the discussion with:
ExitType=cgroup(systemd v250 Dec 2021) does not appear to be sufficient however. Many report the generated service (for XDG autostartvmware-user.desktopviasystemd-xdg-autostart-generator) frequently encounters theTimeoutSec=5s, which still terminates the child process losing the functionality it provides.Description
UPDATE: As later commented below, the related systemd bug report I raised will prevent the timeout issue being triggered for anyone with an upcoming systemd release with that fix.
Thus the changes proposed by this PR may not be relevant to review and accept anymore. I tried to investigate what might cause the race condition for
vmware-user-suid-wrapperto not be recognized as "Started" by systemd. Although I could reliably produce a failure or success, I was unsuccessful at pin-pointing the cause.With the systemd adjustment to their
systemd-xdg-autostart-generator, the timeout termination will only be applicable to stopping a service, I doubt it's worth anyones time to investigate further 🤷♂️Original Content (Click to expand)
The timeout can be extended, but only delays termination. Whatever the cause is, it happens early where the service does not detect itself as "Started", only the intitial "Starting" state is logged (see example).
It may be related to the exit status of
vmware-user-suid-wrapperor some sort of race condition?Proposed Solution - Explicit unit with opt-out from generated service
A custom key can be included in the
.desktopfile which systemd will recognize to ignore generating a service config file.An explicit service config can then be enabled that follows the convention guidelines for XDG DE Integration. It should be placed in
/usr/lib/systemd/user/.I found a near duplicate of the generated service was less likely to encounter the failure, but still possible. It appears to be due to
Type=exec, changing toType=forkingseems to avoid the problem and from thesystemctl --user statusoutput looks more appropriate.Alternative Solution - Override the generated unit
Simply adding a blank override, I was unable to reproduce the failure. If it is a race condition, it seems rather sensitive to however an override affects the scheduling. While I could not reproduce the failure, I would not rely on an empty override side-effect.
Instead, overriding the
Type=exectoType=forkingshould be sufficient, or removing the timeout withTimeoutSec=.The location can be one of the supported user unit locations, with an override config dir + file. It is simpler, and does not need to tamper with the
.desktopfile.Additional Info
Unlike the
.desktopfile this PR specifies a fixed path to the binary forExecStart=. If preferable it could mimic the template approach with.desktopthat is adjusted byMakefile.am, or leave this up to packagers that opt-in to copy this config to the system?I have seen that Arch Linux
open-vm-toolspackage provides two.servicefiles (like other distros), and copies these in theirPKGBUILD. This one differs as I've referenced the XDG generated service (conventions for naming and assigning to theapp.slice, and notably as a user unit), I am not sure if that's the intention since this 2018 commit message formain.cWayland DnD support seems to imply an expectation of being run as root to correctly function?If the alternative solution above is preferred, then there is no
ExecStart=value to be concerned with, but the override dir + file may not be as friendly? (app-vmware\x2duser@autostart.service.d/override.conf)Additionally:
Type=forkingwhich seemed to be appropriate due to thefork()where the command completes quickly but continues to run a process in the background. However while the systemd docs seem to encourage this, they also encourage referencing aPIDFilethat AFAIK isn't produced?Type=execwithExitType=cgroup(which the generated service configured) is viable, but the systemd service docs seem to discourage leaning onExitType=cgroupif possible, it's only used in the XDG autostart generator since it cannot infer what would be more appropriate (EDIT: this pairing seems to be related to the bug).Alternatives considered
This issue is likely specific to the systemd generator being relied on by a distro / DE. The proposed fix seems most appropriate.
The
.desktopconfig could insteadExec=a very simple shell script which also seems to work (presumably because the shell script becomes the main process and thus the wrapper exit status doesn't mislead the service to be terminated):#! /bin/sh /usr/bin/vmware-user-suid-wrapperAs the
.desktopunit is over a decade old, I have the impression it's not exactly ideal for this service, especially for modern distros that ship with systemd? I assume that it's intended to delay starting until some additional context is available, such as the for the X11/Wayland detection? (this comment seems to infer this importance for other projects)Semi-related activity
A similar solution was applied to workaround a Gnome autostart key in
.desktoppreventing service generation by systemd (while the KDE equivalent does not have that affect):.desktopshared does not have a Gnome key, and would likely have the same issue VMware has here. Another comment highlights a concern with forking behaviour that accidentally triggers the process being killed by systemd.The upstream systemd bug for the Gnome autostart key ignore behaviour was resolved in April 2023 (unrelated to VMWare issue). It has some comments which highlight that the generator feature can provide some compatibility, but that projects should ideally migrate to proper systemd services:
Refs
systemd XDG autostart + user units:
.desktop: