Fix deadlock #937 by basilevs · Pull Request #939 · eclipse-equinox/p2

basilevs · 2025-08-31T20:01:34Z

The deadlock was caused by working with ServiceTracker (which may activate bundles) while holding a lock

Do not hold any locks while working with ServiceTracker
Use ServiceTracker capabilities to:

compute priorities
monitor whole sets of services (instead of just one, computed at arbitrary moment)
manage lifetime of services (not only factories)

Decouple handling of OSGI services from explicitly registered ones to avoid cross-lock interaction.
Hide ServiceTrackerCustomizer as implementation detail.

Fixes #937

This is not ready yet, as tests are needed.
@merks is the overall approach acceptable?

github-actions · 2025-08-31T20:04:14Z

Test Results

384 files ±0 384 suites ±0 42m 10s ⏱️ -34s
1 907 tests ±0 1 904 ✅ ±0 3 💤 ±0 0 ❌ ±0
6 721 runs ±0 6 712 ✅ ±0 9 💤 ±0 0 ❌ ±0

Results for commit e065a30. ± Comparison against base commit 93fb948.

♻️ This comment has been updated with latest results.

eclipse-equinox-bot · 2025-08-31T20:09:50Z

This pull request changes some projects for the first time in this development cycle.
Therefore the following files need a version increment:

bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
features/org.eclipse.equinox.p2.core.feature/feature.xml
features/org.eclipse.equinox.p2.extras.feature/feature.xml
features/org.eclipse.equinox.p2.rcp.feature/feature.xml
features/org.eclipse.equinox.p2.sdk/feature.xml
features/org.eclipse.equinox.p2.user.ui/feature.xml
features/org.eclipse.equinox.server.p2/feature.xml

An additional commit containing all the necessary changes was pushed to the top of this PR's branch. To obtain these changes (for example if you want to push more changes) either fetch from your fork or apply the git patch.

Git patch

From 73c5115ce94b57f5fc21fbe3dcb786c851d1a80a Mon Sep 17 00:00:00 2001
From: Eclipse Equinox Bot <equinox-bot@eclipse.org>
Date: Sun, 31 Aug 2025 20:09:32 +0000
Subject: [PATCH] Version bump(s) for 4.38 stream


diff --git a/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF b/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
index d810d1ccd..4e49543f7 100644
--- a/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
+++ b/bundles/org.eclipse.equinox.p2.core/META-INF/MANIFEST.MF
@@ -2,7 +2,7 @@ Manifest-Version: 1.0
 Bundle-ManifestVersion: 2
 Bundle-Name: %pluginName
 Bundle-SymbolicName: org.eclipse.equinox.p2.core;singleton:=true
-Bundle-Version: 2.13.100.qualifier
+Bundle-Version: 2.13.200.qualifier
 Bundle-Activator: org.eclipse.equinox.internal.p2.core.Activator
 Bundle-Vendor: %providerName
 Bundle-Localization: plugin
@@ -63,7 +63,7 @@ Export-Package: org.eclipse.equinox.internal.p2.core;x-friends:="org.eclipse.equ
    org.eclipse.equinox.p2.updatesite,
    org.eclipse.equinox.p2.director.app,
    org.eclipse.equinox.p2.transport.ecf",
- org.eclipse.equinox.p2.core;version="2.13.100";uses:="org.eclipse.core.runtime",
+ org.eclipse.equinox.p2.core;version="2.13.200";uses:="org.eclipse.core.runtime",
  org.eclipse.equinox.p2.core.spi;version="2.2.0";uses:="org.eclipse.equinox.p2.core"
 Bundle-RequiredExecutionEnvironment: JavaSE-17
 Bundle-ActivationPolicy: lazy
diff --git a/features/org.eclipse.equinox.p2.core.feature/feature.xml b/features/org.eclipse.equinox.p2.core.feature/feature.xml
index 4fd7e5946..b0b616d46 100644
--- a/features/org.eclipse.equinox.p2.core.feature/feature.xml
+++ b/features/org.eclipse.equinox.p2.core.feature/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.core.feature"
       label="%featureName"
-      version="1.7.800.qualifier"
+      version="1.7.900.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.extras.feature/feature.xml b/features/org.eclipse.equinox.p2.extras.feature/feature.xml
index d5adc408b..d69cce961 100644
--- a/features/org.eclipse.equinox.p2.extras.feature/feature.xml
+++ b/features/org.eclipse.equinox.p2.extras.feature/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.extras.feature"
       label="%featureName"
-      version="1.4.2900.qualifier"
+      version="1.4.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.rcp.feature/feature.xml b/features/org.eclipse.equinox.p2.rcp.feature/feature.xml
index 3da8783e2..a7dcf8d68 100644
--- a/features/org.eclipse.equinox.p2.rcp.feature/feature.xml
+++ b/features/org.eclipse.equinox.p2.rcp.feature/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.rcp.feature"
       label="%featureName"
-      version="1.4.2900.qualifier"
+      version="1.4.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.sdk/feature.xml b/features/org.eclipse.equinox.p2.sdk/feature.xml
index 884bcfd27..02f3488f7 100644
--- a/features/org.eclipse.equinox.p2.sdk/feature.xml
+++ b/features/org.eclipse.equinox.p2.sdk/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.sdk"
       label="%featureName"
-      version="3.11.2900.qualifier"
+      version="3.11.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.p2.user.ui/feature.xml b/features/org.eclipse.equinox.p2.user.ui/feature.xml
index ee5457fb2..ba059b135 100644
--- a/features/org.eclipse.equinox.p2.user.ui/feature.xml
+++ b/features/org.eclipse.equinox.p2.user.ui/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.p2.user.ui"
       label="%featureName"
-      version="2.4.2900.qualifier"
+      version="2.4.3000.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
diff --git a/features/org.eclipse.equinox.server.p2/feature.xml b/features/org.eclipse.equinox.server.p2/feature.xml
index 2d2bc2e3d..0f2719038 100644
--- a/features/org.eclipse.equinox.server.p2/feature.xml
+++ b/features/org.eclipse.equinox.server.p2/feature.xml
@@ -2,7 +2,7 @@
 <feature
       id="org.eclipse.equinox.server.p2"
       label="%featureName"
-      version="1.12.1800.qualifier"
+      version="1.12.1900.qualifier"
       provider-name="%providerName"
       license-feature="org.eclipse.license"
       license-feature-version="0.0.0">
-- 
2.51.0

Further information are available in Common Build Issues - Missing version increments.

laeubi · 2025-09-01T06:18:56Z

.../org.eclipse.equinox.p2.core/src/org/eclipse/equinox/internal/p2/core/ProvisioningAgent.java

 	private volatile boolean stopped = false;
 	private ServiceRegistration<IProvisioningAgent> reg;
-	private final Map<ServiceReference<IAgentServiceFactory>, ServiceTracker<IAgentServiceFactory, Object>> trackers = Collections
+	private final Map<String, ServiceTracker<IAgentServiceFactory, Object>> trackers = Collections


I would here today use a ConcurrentHashMap, also instead of storing a ServiceTracker object it would be better to use a dedicated class (that internal holds / manages a ServiceTracker), then one can use a quite nice pattern in a way that one first computes that class and then sync on the methods of that particular class. That way the map can work completely lock-free.

While lock-free map is nice, it is not critical in this case, because long-running operations are already done and managed by ServiceTracker outside of locks. The map only holds an instance of ServiceTracker, creation of which does not require any synchronization. ServiceTracker also provides necessary method synchronization, so no additional wrapping is needed. Indeed, ServiceTracker was designed for this exact purpose.

Registration order is important for correct disposal of interdependent services and LinkedHashMap preserves it unlike ConcurrentHashMap.

Also, performance is not a concern here, but computeIfAbsent() for ConcurrentHashMap carries same deadlock risks as Collections.synchronizedMap(), just hides some of conflicts.

laeubi · 2025-09-01T06:21:51Z

.../org.eclipse.equinox.p2.core/src/org/eclipse/equinox/internal/p2/core/ProvisioningAgent.java

+		if (stopped) {
+			return;
 		}
+		agentServices.remove(serviceName, service);


Also for the agentServices I would use ConcurrentHashMap

ConcurrentHashMap does not preserve registration order, which is used to dispose services. See stop()

laeubi · 2025-09-01T06:29:03Z

The test failures seem mostly be cause by the fact that

ProvisioningAgent.stop

closes the tracker but has already been marked as being stopped. I think it must first close all trackers (and maybe release other things as well) to give a chance for the services to properly shut down.

basilevs · 2025-09-01T11:35:41Z

I think it must first close all trackers (and maybe release other things as well) to give a chance for the services to properly shut down.

This will allow population/restoration of services during stop procedure.

I suggest instead to allow access and removal of services while stopped. It makes no sense to disallow access when service is present. I've pushed a prototype.

laeubi · 2025-09-01T12:27:57Z

This will allow population/restoration of services during stop procedure.

If I can't perform required action the stop is not really useful. I also wonder in what cases it really will make sense here and given we did not called stop() before maybe even lead to undesired effects.

Overall, as this is a very crucial part of P2 and Eclipse platform and even used inside Tycho I think we would need to extract this into much smaller pieces each of them only covering a small subset of this PR to get more confident it does not break and understand why a certain thing is good to change.

Also at best the would be some kind of testcase that shows the problem and is fixed afterwards.

basilevs · 2025-09-01T13:06:35Z

@laeubi

If I can't perform required action the stop is not really useful

It is able to stop each service as long as service dependencies are still present. To ensure this, my implementation disposes services in an inverse order of their creation.

I also wonder in what cases it really will make sense here and given we did not called stop() before maybe even lead to undesired effects.

The case is: a stopped service erroneously accesses a dependency that already went away. We can not allow to recreate a dependency, because then the service will work with a new instance while making an invalid assumption, that it was the original.

Current implementation has known defects

it leaks unstopped services
obsolete services continue to be provided when one with a higher ranking is registered
unstarted services are exposed to consumers

extract this into much smaller pieces each of them only covering a small subset of this PR

Splitting the PR is hard because ServiceTrackers are misused in the existing implementation (monitor an ServicesReference of volatile ranking, instead of the highest ranking). I will reopen #938 and see what can be done.

Tests are required, but they would take a significant effort, so I'm collecting input on overall approach (thanks for the comments so far).

laeubi · 2025-11-11T09:02:18Z

@basilevs At least we would need some tests to cover the new behavior.

basilevs · 2025-11-11T12:02:59Z

@basilevs At least we would need some tests to cover the new behavior.

Please merge #938 first.

The deadlock was caused by working with ServiceTracker (which may activate bundles) while holding a lock Do not hold any locks while working with ServiceTracker Fixes eclipse-equinox#937

Re-implement overridden default removedService()

…#937

basilevs · 2025-11-14T06:23:39Z

.../org.eclipse.equinox.p2.core/src/org/eclipse/equinox/internal/p2/core/ProvisioningAgent.java

+										serviceName)); // use old property as fallback
+						return new ServiceTracker<>(context, filter, trackerCustomizer);
+					} catch (InvalidSyntaxException e) {
+						throw new AssertionError(e);


As serviceName is not validated, a syntax error may happen without any other error in agent code. AssertionError should only be thrown on internal errors. Either validate serviceName or throw IllegalArgumentException.

basilevs · 2025-11-14T06:33:41Z

.../org.eclipse.equinox.p2.core/src/org/eclipse/equinox/internal/p2/core/ProvisioningAgent.java

-			return;
+		@Override
+		public void modifiedService(ServiceReference<IAgentServiceFactory> reference, Object service) {
+			// nothing to do


Explain why service reset is not required here.

basilevs force-pushed the deadlock_937 branch from 249d8c2 to 2d6ac3a Compare August 31, 2025 20:28

laeubi reviewed Sep 1, 2025

View reviewed changes

basilevs requested a review from laeubi September 1, 2025 11:51

basilevs mentioned this pull request Sep 9, 2025

Do not expose unstarted service, stop started service leak #938

Merged

basilevs added 4 commits November 11, 2025 20:25

Fix deadlock eclipse-equinox#937

0e76faf

The deadlock was caused by working with ServiceTracker (which may activate bundles) while holding a lock Do not hold any locks while working with ServiceTracker Fixes eclipse-equinox#937

Do not leak a reference to removed service eclipse-equinox#937

032ddd0

Re-implement overridden default removedService()

Fix typo eclipse-equinox#937

8fb5d1d

Allow stop procedure to access still running services eclipse-equinox…

e065a30

…#937

basilevs force-pushed the deadlock_937 branch from 812a471 to e065a30 Compare November 11, 2025 16:48

basilevs commented Nov 14, 2025

View reviewed changes

Conversation

basilevs commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

eclipse-equinox-bot commented Aug 31, 2025

Uh oh!

laeubi Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

basilevs Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laeubi Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

basilevs Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laeubi commented Sep 1, 2025

Uh oh!

basilevs commented Sep 1, 2025

Uh oh!

laeubi commented Sep 1, 2025

Uh oh!

basilevs commented Sep 1, 2025

Uh oh!

laeubi commented Nov 11, 2025

Uh oh!

basilevs commented Nov 11, 2025

Uh oh!

basilevs Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

basilevs Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

basilevs commented Aug 31, 2025 •

edited

Loading

github-actions bot commented Aug 31, 2025 •

edited

Loading

basilevs Sep 1, 2025 •

edited

Loading

basilevs Nov 11, 2025 •

edited

Loading

basilevs Nov 14, 2025 •

edited

Loading