Skip to content

Commit 391ef4b

Browse files
shlomitubulSteNicholas
authored andcommitted
[CELEBORN-2273] Fix cache mutation in TagsManager.getTaggedWorkers()
What changes were proposed in this pull request? getTaggedWorkers() obtains a direct reference to the cached Set from getWorkersWithTag()and then calls retainAll() on it to intersect with other tags and available workers. Since retainAll() mutates the Set in-place, this permanently corrupts the cached entry. When multiple applications with different tag combinations share the same master, one app's intersection shrinks the cached Set, causing subsequent lookups by other apps to find fewer or zero workers. Once corrupted to an empty Set, all future slot requests fail with WORKER_EXCLUDED until the cache is refreshed. Why are the changes needed? Does this PR resolve a correctness bug? Yes Does this PR introduce any user-facing change? No How was this patch tested? custom image in my dev env + local test Closes #3615 from shlomitubul/main. Authored-by: ShlomiTubul <shlomi.tubul@placer.ai> Signed-off-by: SteNicholas <programgeek@163.com>
1 parent dca3749 commit 391ef4b

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

  • master/src/main/scala/org/apache/celeborn/service/deploy/master/tags

master/src/main/scala/org/apache/celeborn/service/deploy/master/tags/TagsManager.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ class TagsManager(configService: Option[ConfigService]) extends Logging {
7979
case Some(w) =>
8080
w.retainAll(taggedWorkers)
8181
case _ =>
82-
workersForTags = Some(taggedWorkers)
82+
workersForTags = Some(new util.HashSet[String](taggedWorkers))
8383
}
8484
}
8585

0 commit comments

Comments
 (0)