Commit 26a9201
authored
GCS client test failure flakiness (ray-project#34656)
Why are these changes needed?
Right now the theory is as follow.
pubsub io service is created and run inside the GcsServer. That means if pubsub io service is accessed after GCSServer GC'ed, it will segfault.
Right now, upon teardown, when we call rpc::DrainAndResetExecutor, this will recreate the Executor thread pool.
Upon teardown, If DrainAndResetExecutor -> GcsServer's internal pubsub posts new SendReply to the newly created threadpool -> GcsServer.reset -> pubsub io service GC'ed -> SendReply invoked from the newly created thread pool, it will segfault.
NOTE: the segfault is from pubsub service if you see the failure
#2 0x7f92034d9129 in ray::rpc::ServerCallImpl<ray::rpc::InternalPubSubGcsServiceHandler, ray::rpc::GcsSubscriberPollRequest, ray::rpc::GcsSubscriberPollReply>::HandleRequestImpl()::'lambda'(ray::Status, std::__1::function<void ()>, std::__1::function<void ()>)::operator()(ray::Status, std::__1::function<void ()>, std::__1::function<void ()>) const::'lambda'()::operator()() const /proc/self/cwd/bazel-out/k8-opt/bin/_virtual_includes/grpc_common_lib/ray/rpc/server_call.h:212:48
As a fix, I only drain the thread pool. And then reset it after all operations are fully cleaned up (only from tests). I think there's no need to reset for regular proc termination like raylet, gcs, core workers.
Related issue number
Closes ray-project#34344
Signed-off-by: SangBin Cho <rkooo567@gmail.com>1 parent 6c35629 commit 26a9201
File tree
7 files changed
+18
-10
lines changed- src/ray
- core_worker
- gcs
- gcs_client/test
- gcs_server
- test
- raylet
- rpc
7 files changed
+18
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
783 | 783 | | |
784 | 784 | | |
785 | 785 | | |
786 | | - | |
| 786 | + | |
787 | 787 | | |
788 | 788 | | |
789 | 789 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
| 108 | + | |
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
| 115 | + | |
115 | 116 | | |
116 | 117 | | |
117 | 118 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2028 | 2028 | | |
2029 | 2029 | | |
2030 | 2030 | | |
2031 | | - | |
| 2031 | + | |
2032 | 2032 | | |
2033 | 2033 | | |
2034 | 2034 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
36 | | - | |
37 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
| |||
0 commit comments