-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use threading.Events to communicate between shutdown and export #4511
base: main
Are you sure you want to change the base?
Conversation
export call is occuring, so that shutdown waits for export call to finish. Use threading.Event() to communicate when shutdown is occuring, so that sleep in export is interrupted if a shutdown is occuring.
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Show resolved
Hide resolved
metadata=self._headers, | ||
timeout=self._timeout, | ||
try: | ||
self._export_not_occuring.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of _export_not_occuring
looks like a lock to me. Is there a benefit to using an event for it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an event allows the export thread to communicate to shutdown that there is / is not a pending RPC. In Shutdown we call the wait() method that blocks until the flag is true.
The problem with the lock is export gives it up, only to immediately require it. When 2 threads ask for a lock there's no guarantee on which gets it.
If the behavior that we want is for shutdown
to block for any pending RPC and otherwise execute I think an event is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but if you're doing
while:
if shutdown_occuring.is_set(): return
event.clear()
export()
event.set()
there is no guarantee that the thing waiting for the event will have run and set shutdown_occuring
before export()
gets called again. I think even switching to a lock doesn't necessarily solve everything. Might need to rethink the approach a little.
delay, | ||
) | ||
self._shutdown_occuring.wait(delay) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this would be a little clearer to just return here
exporter/opentelemetry-exporter-otlp-proto-grpc/tests/test_otlp_exporter_mixin.py
Outdated
Show resolved
Hide resolved
def run(self): | ||
if self._target is not None: # type: ignore | ||
self._return = self._target(*self._args, **self._kwargs) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include the cleanup from the original run function or is that not a concern here?
def run(self): | |
if self._target is not None: # type: ignore | |
self._return = self._target(*self._args, **self._kwargs) # type: ignore | |
try: | |
if self._target is not None: | |
self._return = self._target(*self._args, **self._kwargs) | |
finally: | |
# Avoid a refcycle if the thread is running a function with | |
# an argument that has a member that points to the thread. | |
del self._target, self._args, self._kwargs |
def join(self, *args): # type: ignore | ||
threading.Thread.join(self, *args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we avoid type ignore by explicitly passing the expected type?
def join(self, *args): # type: ignore | |
threading.Thread.join(self, *args) | |
def join(self, timeout: float | None = None) -> Any: | |
threading.Thread.join(self, timeout=timeout) |
# value will remain constant. | ||
for delay in _create_exp_backoff_generator(max_value=max_value): | ||
if delay == max_value or self._shutdown: | ||
for delay in [1, 2, 4, 8, 16, 32]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it include 64 as well like max_value before?
metadata=self._headers, | ||
timeout=self._timeout, | ||
try: | ||
self._export_not_occuring.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but if you're doing
while:
if shutdown_occuring.is_set(): return
event.clear()
export()
event.set()
there is no guarantee that the thing waiting for the event will have run and set shutdown_occuring
before export()
gets called again. I think even switching to a lock doesn't necessarily solve everything. Might need to rethink the approach a little.
|
||
return self._result.FAILURE | ||
|
||
return self._result.FAILURE | ||
|
||
def shutdown(self, timeout_millis: float = 30_000, **kwargs) -> None: | ||
if self._shutdown: | ||
if self._shutdown_occuring.is_set(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already had the same problem, but shutdown()
is not thread safe. I guess for this PR we can assume only one thread calls it.
Description
It seems like the behavior we want for
Shutdown()
is:Shutdown()
to interrupt thesleep
call inexport
, so we don't idle only to report Failure.This PR accomplishes these via threading events.
event
forexport
to communicate toshutdown
that an RPC is in progress, and to wait until it's done or the shutdown timeout finishes.shutdown
to communicate toexport
that shutdown is happening, and it doesn't need tosleep
.We use these 2 events to communicate between the 2 threads. AFAIK there are only 2 threads we need to worry about, one thread where
export
is repeatedly called, and the main thread whereshutdown
is called.Note that this PR also fixes a bug where were we were needlessly sleeping for 32 seconds only to report failure, because we would simply break out of the loop in the next iteration. I also did some minor code cleanup in the exporters in this PR.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Still need to write tests. Putting this out there now to get early feedback.
Does This PR Require a Contrib Repo Change?
Checklist: