Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
5a8d05e
update to protbuf-28.2, absl-20240722, grpc-1.67 and patch for windows
mattip Mar 25, 2025
4a98bda
ignore root user error
mattip Mar 25, 2025
9a823f2
no warning/error on deprecated-declarations
mattip Mar 27, 2025
583a597
patch abseil for gcc bug
mattip Mar 30, 2025
1325338
patch grpc to avoid goaway message on shutdown broadcast
mattip Apr 5, 2025
57c4d61
remove zlib patch via grpc, the newer grpc uses a later zlib that is …
mattip Apr 8, 2025
16d1672
use large machine for debug build
mattip Apr 9, 2025
9aad6b4
remove iwyu from build, revert ignoring cpp warnings
mattip Apr 18, 2025
b2340e6
update rules-proto-grpc from before 1.0 to 4.6.0
mattip Apr 20, 2025
60f7ff3
use protobuf 28.2 for java, python
mattip Apr 21, 2025
92d351b
restore global '-Wno-deprecated-declarations', selectively apply patc…
mattip May 22, 2025
2c85402
patch unconditionally
mattip May 22, 2025
3a9f3a3
update protobuf compile rule update for python
mattip May 22, 2025
3b6ca0a
update java protobuf version, fix 'cannot be null' errors
mattip May 23, 2025
1854875
update python protobuf package to >=4.25.7
mattip May 23, 2025
4576ae2
add missing pubsub_proto python module
mattip May 23, 2025
cefa898
limit cpu use in build
mattip May 24, 2025
4b9a295
add more protobuf to all_py_proto for lint tests
mattip May 24, 2025
599e32a
update python protobuf to >=5.28.0
mattip May 25, 2025
7866870
do not change the base protobuf version (from review)
mattip May 30, 2025
163b842
revert updating java protobuf version
mattip May 31, 2025
9922eea
add fix from PR 53960
mattip Jun 20, 2025
548b065
Revert "add fix from PR 53960"
mattip Jul 3, 2025
c53cbda
fix from grpc maintainers
mattip Jul 3, 2025
74f4fa2
fix merge
mattip Jul 27, 2025
886c369
resync with master
mattip Jul 28, 2025
e447a58
revert unneeded change
mattip Jul 28, 2025
1dc3415
Merge branch 'master' into vs2022
edoakes Jul 29, 2025
20c9409
merge master into branch
mattip Aug 12, 2025
518f441
add blank patch to make merge from master happy
mattip Aug 12, 2025
366f32f
fix patch for newer grpc
mattip Aug 12, 2025
4b20efe
raise thread count in test, maybe reporting is now more correct?
mattip Aug 14, 2025
d63bfdc
merge master into branch
mattip Aug 14, 2025
3e926c4
typo
mattip Aug 14, 2025
5066b4e
remove unneeded change
mattip Aug 18, 2025
80f4beb
Merge branch 'master' into vs2022
mattip Aug 18, 2025
fd36dfb
Revert "remove unneeded change"
mattip Aug 19, 2025
da536f5
merge master into branch
mattip Sep 4, 2025
769875c
merge master into branch
mattip Sep 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ build:linux --workspace_status_command="bash ./bazel/workspace_status.sh"
build --action_env=RAY_BUILD_ENV

###############################################################################
# On Windows, provide: BAZEL_SH, and BAZEL_LLVM (if using clang-cl)
# On Windows, provide: BAZEL_SH, and BAZEL_LLVM
# On all platforms, provide: PYTHON3_BIN_PATH=python
###############################################################################
build:windows --action_env=PATH
Expand All @@ -19,12 +19,12 @@ build --compilation_mode=opt
# Using C++ 17 on all platforms.
build:linux --host_cxxopt="-std=c++17"
build:macos --host_cxxopt="-std=c++17"
build:clang-cl --host_cxxopt="-std=c++17"
build:clang-cl --host_cxxopt="-std:c++17"
build:msvc-cl --host_cxxopt="/std:c++17"
build:windows --host_cxxopt="/std:c++17"
build:linux --cxxopt="-std=c++17"
build:macos --cxxopt="-std=c++17"
build:clang-cl --cxxopt="-std=c++17"
build:clang-cl --cxxopt="-std:c++17"
build:msvc-cl --cxxopt="/std:c++17"
build:windows --cxxopt="/std:c++17"
# This workaround is needed to prevent Bazel from compiling the same file twice (once PIC and once not).
Expand All @@ -51,10 +51,10 @@ build:windows --enable_runfiles
# TODO(mehrdadn): Revert the "-\\.(asm|S)$" exclusion when this Bazel bug
# for compiling assembly files is fixed on Windows:
# https://github.com/bazelbuild/bazel/issues/8924
# Warnings should be errors
build:linux --per_file_copt="-\\.(asm|S)$@-Werror"
build:macos --per_file_copt="-\\.(asm|S)$@-Werror"
build:clang-cl --per_file_copt="-\\.(asm|S)$@-Werror"
# Warnings should be errors, except for deprecated protobuf CreateMessage use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to fix the code instead? Or if the code is thirty party, can we only ignore when we compile those code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecated declarations aren't in our code
Can you just update the build for whichever library we need this for instead @mattip . Ex. for msgpack we do this

"-Wno-shadow",
.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just update the build for whichever library we need this for

It seems to come from protoc which has this admonition

# The protobuf version we use to auto generate python and java code.
# This can be different from the protobuf version that Ray core uses internally.
# Generally this should be a lower version since protobuf guarantees that
# code generated by protoc of version X can be used with protobuf library of version >= X.
# So the version here effectively determines the lower bound of python/java
# protobuf library that Ray supports.
http_archive(
name = "com_google_protobuf_rules_proto_grpc",
strip_prefix = "protobuf-3.19.4",
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.19.4.tar.gz"],
sha256 = "3bd7828aa5af4b13b99c191e8b1e884ebfa9ad371b0ce264605d347f135d2568",
)

What are the implications of updating the basic protobuf version for compiled messages? I was afraid that would break backward compatibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf guarantees that code generated by the older version of the protobuf can be used with newer version of protobuf library. If we upgrade this, we might need to upgrade the lower bound of our python protobuf dependency.

Can we just ignore that warning when we compile protobuf generated source code? cc @dayshah

Copy link
Contributor

@dayshah dayshah Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya we should be able to add copts here when we build

https://github.com/ray-project/ray/blob/master/src/ray/protobuf/BUILD

If it doesn't work, I'd say not a blocker to merge this, can just try removing warning later, or try upgrading this in general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after I remove iwyu, the java build and python build still want to use the older rules_proto_grpc. The commit a74fef3 comes from 2019. I will try to update only that to latest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I need to update both the rules and the protobuf, let's see if it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that fails. Maybe this is all coming from bazel 6.5.0, which includes code for protobuf 3.19

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we would need to update to bazel 8.0 to get away from the protobuf3.19 pinning in the bazel codebase itself. That seems wrong. There must be a way to separate the ray use of bazel from the tool use, in a way that will totally isolate the headers in the ray code from building the tool used to compile proto files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we would need to update to bazel 8.0 to get away from the protobuf3.19 pinning in the bazel codebase itself.

Usage in bazel itself and usage in a target project are two different things. In conda-forge specifically, they're kinda related in that the bazel in the build environment will bring along its underlying shared protobuf, but we keep rebuilding bazel for new protobuf versions explicitly for that purpose (and we've kept the bazel v6.x branch current more or less specifically for ray; the only other user is tensorflow)

build:linux --per_file_copt="-\\.(asm|S)$@-Werror,-Wno-deprecated-declarations"
build:macos --per_file_copt="-\\.(asm|S)$@-Werror,-Wno-deprecated-declarations"
build:clang-cl --per_file_copt="-\\.(asm|S)$@-Werror,-Wno-deprecated-declarations"
build:msvc-cl --per_file_copt="-\\.(asm|S)$@-WX"
# Ignore warnings for protobuf generated files and external projects.
build --per_file_copt="\\.pb\\.cc$@-w"
Expand Down Expand Up @@ -238,3 +238,5 @@ try-import %workspace%/.user.bazelrc
build:macos --sandbox_block_path=/usr/local/
# This option controls whether javac checks for missing direct dependencies.
build --experimental_strict_java_deps=off
build:clang-cl --per_file_copt="-external/com_github_redis_hiredis/ssl.c$@-Wno-parenthesis,-Wno-int-conversion"
build --local_cpu_resources=HOST_CPUS*0.75
2 changes: 1 addition & 1 deletion .buildkite/core.rayci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ steps:
tags:
- python
- skip-on-premerge
instance_type: medium
instance_type: large
commands:
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/tests/... core
--install-mask all-ray-libraries
Expand Down
1 change: 1 addition & 0 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ python_register_toolchains(
name = "python3_9",
python_version = "3.9",
register_toolchains = False,
ignore_root_user_error = True,
)

load("@python3_9//:defs.bzl", python39 = "interpreter")
Expand Down
32 changes: 15 additions & 17 deletions bazel/ray_deps_setup.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -86,20 +86,16 @@ def auto_http_archive(
def ray_deps_setup():
# Explicitly bring in protobuf dependency to work around
# https://github.com/ray-project/ray/issues/14117
# This is copied from grpc's bazel/grpc_deps.bzl
#
# Pinned grpc version: v23.4
http_archive(
name = "com_google_protobuf",
sha256 = "76a33e2136f23971ce46c72fd697cd94dc9f73d56ab23b753c3e16854c90ddfd",
strip_prefix = "protobuf-2c5fa078d8e86e5f4bd34e6f4c9ea9e8d7d4d44a",
sha256 = "b2340aa47faf7ef10a0328190319d3f3bee1b24f426d4ce8f4253b6f27ce16db",
strip_prefix = "protobuf-28.2",
urls = [
"https://github.com/protocolbuffers/protobuf/archive/2c5fa078d8e86e5f4bd34e6f4c9ea9e8d7d4d44a.tar.gz",
"https://github.com/protocolbuffers/protobuf/archive/refs/tags/v28.2.tar.gz",
],
patches = [
"@com_github_grpc_grpc//third_party:protobuf.patch",
"@io_ray//thirdparty/patches:protobuf-windows-const-nan.patch",
],
patch_args = ["-p1"],
)

# NOTE(lingxuan.zlx): 3rd party dependencies could be accessed, so it suggests
Expand Down Expand Up @@ -250,10 +246,10 @@ def ray_deps_setup():
# TODO(owner): Upgrade abseil to latest version after protobuf updated, which requires to upgrade `rules_cc` first.
auto_http_archive(
name = "com_google_absl",
sha256 = "987ce98f02eefbaf930d6e38ab16aa05737234d7afbab2d5c4ea7adbe50c28ed",
strip_prefix = "abseil-cpp-20230802.1",
sha256 = "f50e5ac311a81382da7fa75b97310e4b9006474f9560ac46f54a9967f07d4ae3",
strip_prefix = "abseil-cpp-20240722.0",
urls = [
"https://github.com/abseil/abseil-cpp/archive/refs/tags/20230802.1.tar.gz",
"https://github.com/abseil/abseil-cpp/archive/refs/tags/20240722.0.tar.gz",
],
patches = [
# TODO (israbbani): #55430 Separate the compiler flags and remove this patch
Expand All @@ -278,10 +274,11 @@ def ray_deps_setup():
auto_http_archive(
name = "com_github_grpc_grpc",
# NOTE: If you update this, also update @boringssl's hash.
url = "https://github.com/grpc/grpc/archive/refs/tags/v1.57.1.tar.gz",
sha256 = "0762f809b9de845e6a7c809cabccad6aa4143479fd43b396611fe5a086c0aeeb",
url = "https://github.com/grpc/grpc/archive/refs/tags/v1.67.1.tar.gz",
sha256 = "d74f8e99a433982a12d7899f6773e285c9824e1d9a173ea1d1fb26c9bd089299",
patches = [
"@io_ray//thirdparty/patches:grpc-cython-copts.patch",
"@io_ray//thirdparty/patches:grpc-avoid-goaway-messages.patch",
"@io_ray//thirdparty/patches:grpc-zlib-fdopen.patch",
"@io_ray//thirdparty/patches:grpc-configurable-thread-count.patch",
],
Expand Down Expand Up @@ -323,13 +320,13 @@ def ray_deps_setup():
http_archive(
# This rule is used by @com_github_grpc_grpc, and using a GitHub mirror
# provides a deterministic archive hash for caching. Explanation here:
# https://github.com/grpc/grpc/blob/1ff1feaa83e071d87c07827b0a317ffac673794f/bazel/grpc_deps.bzl#L189
# Ensure this rule matches the rule used by grpc's bazel/grpc_deps.bzl
# https://github.com/grpc/grpc/blob/v1.67.1/bazel/grpc_deps.bzl#L33
name = "boringssl",
sha256 = "0675a4f86ce5e959703425d6f9063eaadf6b61b7f3399e77a154c0e85bad46b1",
strip_prefix = "boringssl-342e805bc1f5dfdd650e3f031686d6c939b095d9",
sha256 = "c70d519e4ee709b7a74410a5e3a937428b8198d793a3d771be3dd2086ae167c8",
strip_prefix = "boringssl-b8b3e6e11166719a8ebfa43c0cde9ad7d57a84f6",
urls = [
"https://github.com/google/boringssl/archive/342e805bc1f5dfdd650e3f031686d6c939b095d9.tar.gz",
"https://github.com/google/boringssl/archive/b8b3e6e11166719a8ebfa43c0cde9ad7d57a84f6.tar.gz",
],
)

Expand All @@ -345,6 +342,7 @@ def ray_deps_setup():
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.19.4.tar.gz"],
sha256 = "3bd7828aa5af4b13b99c191e8b1e884ebfa9ad371b0ce264605d347f135d2568",
)

auto_http_archive(
name = "rules_proto_grpc",
url = "https://github.com/rules-proto-grpc/rules_proto_grpc/archive/a74fef39c5fe636580083545f76d1eab74f6450d.tar.gz",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ TEST_F(TaskEventTestWriteExport, TestWriteTaskExportEvents) {
auto task_ids = GenTaskIDs(num_events);
google::protobuf::util::JsonPrintOptions options;
options.preserve_proto_field_names = true;
options.always_print_primitive_fields = true;
options.always_print_fields_with_no_presence = true;

std::vector<SourceTypeVariant> source_types = {
rpc::ExportEvent_SourceType::ExportEvent_SourceType_EXPORT_TASK};
Expand Down
20 changes: 7 additions & 13 deletions src/ray/gcs/gcs_server/tests/gcs_actor_manager_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1224,8 +1224,7 @@ TEST_F(GcsActorManagerTest, TestGetAllActorInfoFilters) {
rpc::GetAllActorInfoRequest request;
request.mutable_filters()->set_actor_id(actor->GetActorID().Binary());

auto &reply =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
auto &reply = *google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply, callback);
ASSERT_EQ(reply.actor_table_data().size(), 1);
ASSERT_EQ(reply.total(), 1 + num_other_actors);
Expand All @@ -1237,8 +1236,7 @@ TEST_F(GcsActorManagerTest, TestGetAllActorInfoFilters) {
rpc::GetAllActorInfoRequest request;
request.mutable_filters()->set_job_id(job_id.Binary());

auto &reply =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
auto &reply = *google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply, callback);
ASSERT_EQ(reply.actor_table_data().size(), 1);
ASSERT_EQ(reply.num_filtered(), num_other_actors);
Expand All @@ -1249,8 +1247,7 @@ TEST_F(GcsActorManagerTest, TestGetAllActorInfoFilters) {
rpc::GetAllActorInfoRequest request;
request.mutable_filters()->set_state(rpc::ActorTableData::ALIVE);

auto &reply =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
auto &reply = *google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply, callback);
ASSERT_EQ(reply.actor_table_data().size(), 1);
ASSERT_EQ(reply.num_filtered(), num_other_actors);
Expand All @@ -1262,8 +1259,7 @@ TEST_F(GcsActorManagerTest, TestGetAllActorInfoFilters) {
request.mutable_filters()->set_state(rpc::ActorTableData::ALIVE);
request.mutable_filters()->set_job_id(job_id.Binary());

auto &reply =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
auto &reply = *google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply, callback);
ASSERT_EQ(reply.actor_table_data().size(), 1);
ASSERT_EQ(reply.num_filtered(), num_other_actors);
Expand All @@ -1273,8 +1269,7 @@ TEST_F(GcsActorManagerTest, TestGetAllActorInfoFilters) {
request.mutable_filters()->set_state(rpc::ActorTableData::DEAD);
request.mutable_filters()->set_job_id(job_id.Binary());

auto &reply =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
auto &reply = *google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply, callback);
ASSERT_EQ(reply.num_filtered(), num_other_actors + 1);
ASSERT_EQ(reply.actor_table_data().size(), 0);
Expand All @@ -1297,14 +1292,13 @@ TEST_F(GcsActorManagerTest, TestGetAllActorInfoLimit) {
{
rpc::GetAllActorInfoRequest request;
auto &reply =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
*google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
auto callback = [](Status, std::function<void()>, std::function<void()>) {};
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply, callback);
ASSERT_EQ(reply.actor_table_data().size(), 3);

request.set_limit(2);
auto &reply_2 =
*google::protobuf::Arena::CreateMessage<rpc::GetAllActorInfoReply>(&arena);
auto &reply_2 = *google::protobuf::Arena::Create<rpc::GetAllActorInfoReply>(&arena);
gcs_actor_manager_->HandleGetAllActorInfo(request, &reply_2, callback);
ASSERT_EQ(reply_2.actor_table_data().size(), 2);
ASSERT_EQ(reply_2.total(), 3);
Expand Down
2 changes: 1 addition & 1 deletion src/ray/rpc/server_call.h
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ class ServerCallImpl : public ServerCall {
cluster_id_(cluster_id),
start_time_(0),
record_metrics_(record_metrics) {
reply_ = google::protobuf::Arena::CreateMessage<Reply>(&arena_);
reply_ = google::protobuf::Arena::Create<Reply>(&arena_);
// TODO(Yi Cheng) call_name_ sometimes get corrunpted due to memory issues.
RAY_CHECK(!call_name_.empty()) << "Call name is empty";
if (record_metrics_) {
Expand Down
2 changes: 1 addition & 1 deletion src/ray/util/event.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ std::string LogEventReporter::ExportEventToString(const rpc::ExportEvent &export
google::protobuf::util::JsonPrintOptions options;
options.preserve_proto_field_names = true;
// Required so enum with value 0 is not omitted
options.always_print_primitive_fields = true;
options.always_print_fields_with_no_presence = true;
if (export_event.has_task_event_data()) {
RAY_CHECK(google::protobuf::util::MessageToJsonString(
export_event.task_event_data(), &event_data_as_string, options)
Expand Down
19 changes: 19 additions & 0 deletions thirdparty/patches/grpc-avoid-goaway-messages.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
diff --git a/src/core/ext/transport/chttp2/transport/chttp2_transport.cc b/src/core/ext/transport/chttp2/transport/chttp2_transport.cc
index d3c5d994c1..1edc7ae29d 100644
--- src/core/ext/transport/chttp2/transport/chttp2_transport.cc
+++ src/core/ext/transport/chttp2/transport/chttp2_transport.cc
@@ -1176,13 +1176,6 @@ void grpc_chttp2_add_incoming_goaway(grpc_chttp2_transport* t,
GRPC_TRACE_LOG(http, INFO)
<< "transport " << t << " got goaway with last stream id "
<< last_stream_id;
- // We want to log this irrespective of whether http tracing is enabled if we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Got goaway..." log message below is printed during shutdown. The issue was raised on conda-forge, which uses the grpc/abseil versions in this PR, and exibited itself in the test_worker_stdout test, as I commented in the PR itself. @apmorton also weighed in that the log message was probably always emitted, but something has changed to make it visible. I tried a few other ways to patch this out:

  • using LOG(DEBUG) does not compile, abseil does not have a DEBUG log level
  • removing the log call from BroadcastShutdown, but there was a concern about unintended side effects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GRPC seems to think this is a notable thing to log about, so it's probably worth looking into the root cause?

If grpc think it's a real issue, we should fix in Ray instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if it's harmless (given it's INFO log), we can keep it and update our test test_worker_stdout instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If grpc think it's a real issue, we should fix in Ray instead?
Or if it's harmless (given it's INFO log), we can keep it and update our test test_worker_stdout instead?

I agree getting ray to not emit the message would be best. As the conda-forge issue shows, allowing the emitted message as expected behaviour (i.e. changing the test) will cause not a small amount of pain for users since the message will appear in their logs.

The call chain is

  • ray calls server_->Shutdown(gpr_now(GPR_CLOCK_REALTIME));
  • this (according to the error log trace) eventually winds up at broadcaster.BroadcastShutdown inside of a call to void Server::CancelAllCalls(), which is an external API function with no arguments.
    So I don't see a way to avoid the message other than patching GRPC. Perhaps we could set the abseil error reporting state to something higher than INFO?

In what circumstances would it be beneficial to users to see that during a shutdown a GOAWAY message was emitted? I would imagine that is just noise to be avoided, but maybe I don't understand.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not SetGlobalVLogLevel please... you'd be messing with log level for all users of absl process wide, which is not very nice.

SetVLogLevel to turn off grpc logs in ray also seems... odd.

If patching the log line out isn't acceptable,

server_->Shutdown(gpr_now(GPR_CLOCK_REALTIME));
is the root of the problem in ray code that needs to be fixed.

calling Shutdown with a deadline of "now" immediately causes a timeout of the graceful shutdown, which sends out forceful cancellation and logs this message.

There are two options:

  • Calling Shutdown(), which internally uses an infinite timeout (and so always performs a graceful shutdown)
  • Decide on and use a reasonable timeout to wait for pending operations before cancelling them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If patching the log line out isn't acceptable,

I think we should continue with the option in this PR: patching out the log line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetVLogLevel to turn off grpc logs in ray also seems... odd.

Actually why is it odd? Inside Ray, we only care about grpc logs that are warning and above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone wants to see the GOAWAY message, it is a direct result of ray's shutdown-immediately policy.

The API for SetVLogLevel takes a file name and a level. Isn't this exactly equivalent to the patch I did?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're ok with just using SetVLogLevel for now. If you can isolate it to a file, that would be great. We'll create follow ups on our end to change this moving forward in the future.

- // received a GOAWAY with a non NO_ERROR code.
- if (goaway_error != GRPC_HTTP2_NO_ERROR) {
- LOG(INFO) << t->peer_string.as_string_view() << ": Got goaway ["
- << goaway_error
- << "] err=" << grpc_core::StatusToString(t->goaway_error);
- }
if (t->is_client) {
cancel_unstarted_streams(t, t->goaway_error, false);
// Cancel all unseen streams

11 changes: 5 additions & 6 deletions thirdparty/patches/grpc-configurable-thread-count.patch
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
diff --git src/core/lib/gpr/linux/cpu.cc b/src/core/lib/gpr/linux/cpu.cc
index 670ca6551c..043021dc4a 100644
--- src/core/lib/gpr/linux/cpu.cc
+++ src/core/lib/gpr/linux/cpu.cc
diff --git src/core/util/linux/cpu.cc src/core/util/linux/cpu.cc
--- src/core/util/linux/cpu.cc
+++ src/core/util/linux/cpu.cc
@@ -24,6 +24,7 @@

#ifdef GPR_CPU_LINUX
Expand All @@ -10,7 +9,7 @@ index 670ca6551c..043021dc4a 100644
#include <errno.h>
#include <sched.h>
#include <string.h>
@@ -49,7 +50,17 @@ static void init_num_cpus() {
@@ -50,7 +51,17 @@ static void init_num_cpus() {
#endif
// This must be signed. sysconf returns -1 when the number cannot be
// determined
Expand All @@ -27,5 +26,5 @@ index 670ca6551c..043021dc4a 100644
+ ncpus = static_cast<int>(sysconf(_SC_NPROCESSORS_CONF));
+ }
if (ncpus < 1) {
gpr_log(GPR_ERROR, "Cannot determine number of CPUs: assuming 1");
LOG(ERROR) << "Cannot determine number of CPUs: assuming 1";
ncpus = 1;
13 changes: 0 additions & 13 deletions thirdparty/patches/grpc-zlib-fdopen.patch
Original file line number Diff line number Diff line change
@@ -1,13 +0,0 @@
diff -u bazel/grpc_deps.bzl
--- bazel/grpc_deps.bzl
+++ bazel/grpc_deps.bzl
@@ -238,6 +238,9 @@
"https://storage.googleapis.com/grpc-bazel-mirror/github.com/madler/zlib/archive/04f42ceca40f73e2978b50e93806c2a18c1281fc.tar.gz",
"https://github.com/madler/zlib/archive/04f42ceca40f73e2978b50e93806c2a18c1281fc.tar.gz",
],
+ patches = [
+ "@io_ray//thirdparty/patches:zlib-fdopen.patch",
+ ]
)

if "com_google_protobuf" not in native.existing_rules():
46 changes: 46 additions & 0 deletions thirdparty/patches/protobuf-windows-const-nan.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
diff -u upb/message/internal/message.c /tmp/message.c
--- upb/message/internal/message.c
+++ upb/message/internal/message.c
@@ -19,6 +19,26 @@

const float kUpb_FltInfinity = INFINITY;
const double kUpb_Infinity = INFINITY;
-const double kUpb_NaN = NAN;
+
+// The latest win32 SDKs have an invalid definition of NAN.
+// https://developercommunity.visualstudio.com/t/NAN-is-no-longer-compile-time-constant-i/10688907
+//
+// Unfortunately, the `0.0 / 0.0` workaround doesn't work in Clang under C23, so
+// try __builtin_nan first, if that exists.
+#ifdef _WIN32
+#ifdef __has_builtin
+#if __has_builtin(__builtin_nan)
+#define UPB_NAN __builtin_nan("0")
+#endif
+#endif
+#ifndef UPB_NAN
+#define UPB_NAN 0.0 / 0.0
+#endif
+#else
+// For !_WIN32, assume math.h works.
+#define UPB_NAN NAN
+#endif
+const double kUpb_NaN = UPB_NAN;
+

bool UPB_PRIVATE(_upb_Message_Realloc)(struct upb_Message* msg, size_t need,

diff --git .bazelrc .bazelrc
--- .bazelrc
+++ .bazelrc
@@ -1,4 +1,9 @@
-build --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
+build --enable_platform_specific_config
+build:linux --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
+build:macos --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
+build:windows --compiler=clang-cl
+build:windows --cxxopt=-Wno-invalid-offsetof
+build:windows --cxxopt=-std:c++17 --host_cxxopt=-std:c++17

build:dbg --compilation_mode=dbg