-
Notifications
You must be signed in to change notification settings - Fork 586
Checkable#ProcessCheckResult(): discard🗑️ CR or delay its producers shutdown #10397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Well, that's indeed necessary.
Interesting that I've said something about a comment I haven't even seen until just now. |
ce0d035
to
5b1c4b5
Compare
@julianbrost Very funny. Fact is, on Thursday you
So I've done that. So far:
Finally, CheckResultProducer::Ptr just has to be passed through. |
a0ad767
to
a06b01a
Compare
ea36975
to
a06b01a
Compare
72a171d
to
4873a34
Compare
@Al2Klimov please don't be mad at me :)! I took the initiative to expand the PR description slightly based on the analysis made together with @jschmidt-icinga, so everyone can follow here without having to browse through the changed files. Though, you can gladly merge all the commits suffixed with the |
I think in principle this solves the problem well, but I agree with @yhabteab, kindly squash it down to a single commit describing the functional change (the PR title is fine), as none of the individual changes make sense without all the others. That would also make it much easier to review on GH. |
4873a34
to
f0de2aa
Compare
Are you sure? I've squashed the low-hanging fruits into 170e7cf – and see how big this one commit became! |
Yes, I'm sure. There's big commits and there's commits with trivial one-line changes in lots of files. You could split off adding the new Classes, maybe adding the final check, but at least all the function signature changes should be in a single commit. What's more, commits should reflect individual consistent states of the code, i.e., each of them, when checked out, should at least still compile and the program should still work. Granted this is less important when on a branch that gets merged, but its still a good idea to maintain some discipline. |
I've had a discussion with @julianbrost and @yhabteab with the following results:
|
Benefits of my version (ordered descending by importance):
... especially when you talk about Go, remember how context works:
This restriction would vanish here if I got your idea correctly:
(Given #10397 (comment), strange that you wanna give up such a restriction.) Let me know whether my concern is legit or I shall refactor anyway.
Before the component is activated? But why? |
That issue is already solved when you take into account that we pass the base class pointer that doesn't have a virtual interface for the
Maybe we're talking past each other a bit. Just to clarify, the intention is not to get rid of passing what's currently the Otherwise, "all affected" is doing some heavy lifting here, when it's only Something like
In addition to what I wrote above, with Also that is about access control, which is an aspect of C++ I don't necessarily care that much about, while in the linked comment I was talking about implementation hiding, which is about not having to know how the sausage is made, both as a compiler including the header, and a programmer looking at a header to learn about how to use a class. I get that currently this is an annoyance in C++ and requires a lot of discipline by the programmer and is not always worth the effort. Modules will improve that, when they are ready to be used some time around 2070.
Because why not. It makes the class simpler, with fewer possible user errors, and for the immediate use case of controlling check result ownership it doesn't matter, because a component that hasn't started yet will not order any checks. |
I'm afraid Object is required:
I like Join() better tbh.
"As per the spec," we may only use #ifndef for include guards. But we use Seriously speaking, it feels pretty cringe to implement a method we know in advance nobody's gonna use AND that the method won't do anything useful. |
45851bc
to
329bdb8
Compare
Right, because we want to store it in a value for Function::Invoke(). I think there is a way to still do this by extending
I agree, use Join().
In fact, I already have a use for it shutting down Edit: Only just realized you already pushed the update. I'll have another look right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now. I really think this is a great addition. 👏
But maybe lets hold back on merging until Monday so @julianbrost can have another quick look.
Just one more quick remark: By accident you seem to have used spaces for indentation on the new files. |
329bdb8
to
a624d03
Compare
ApiListener CheckerComponent ExternalCommandListener LivestatusListener
Namely: Checkable#ProcessCheckResult() ClusterCheckTask::ScriptFunc() ClusterZoneCheckTask::ScriptFunc() DummyCheckTask::ScriptFunc() ExceptionCheckTask::ScriptFunc() IcingaCheckTask::ScriptFunc() IfwApiCheckTask::ScriptFunc() NullCheckTask::ScriptFunc() PluginCheckTask::ScriptFunc() RandomCheckTask::ScriptFunc() SleepCheckTask::ScriptFunc() IdoCheckTask::ScriptFunc() IcingadbCheck::ScriptFunc() CheckCommand#Execute() Checkable#ExecuteCheck() ClusterEvents::ExecuteCheckFromQueue() ExternalCommandProcessor::Process*CheckResult() ExternalCommandCallback ExternalCommandProcessor::Execute() ExternalCommandProcessor::ExecuteFromFile() ExternalCommandProcessor::ProcessFile() LivestatusQuery#ExecuteCommandHelper() LivestatusQuery#Execute()
a624d03
to
36743f3
Compare
There were some (specifically two ;)) minor changes made since your last review, so please have another look again! |
This PR is the first step in the process of making the shutdown order of our components less unpredictable. Specifically, the goal is to ensure that components that produce any kind of events (e.g. notifications) or check results are stopped in a determined order, and they don't cause any post-stop events to be triggered. The problem this PR specifically addresses is that e.g. stopping the
CheckerComponent
does prevent new checks from being scheduled, but it does not wait for already started (so can't be cancelled) checks to finish. This can lead to a situation where the e.g.IcingaDB
component is stopped before such checks complete and causing all the resulted events to be lost, becauseIcingaDB
is not able to process them anymore. How does this PR solve this?It introduces a new
CheckResultProducer
interface that is implemented only by types that actually produce check results, which as of now areCheckerComponent
(for local checks) andApiListener
(for remote checks). Each of such instance maintains a simple counter of the number of ongoing checks results it produces. However, they don't just increment the counter after e.g.Checkable::ExecuteCheck()
is called, but only after the check execution produces a result and enters theCheckable::ProcessCheckResult()
method, and decremented when leaving it. This way, we don't have to worry about the fact that check execution time can be very long and block the shutdown of the component indefinitely (credit goes to @Al2Klimov). Ok, but how does this prevent triggering post-factum events, you ask? Well, theCheckable::ProcessCheckResult
requires now theCheckResultProducer
be passed in as an argument, and is only allowed to actually process that result after locking the producer, otherwise thecr
is discarded.But what does locking the producer mean? It doesn't mean acquiring the
ObjectLock
on that object, but rather acquiring a shared lock1 on it. However, here's the catch: the producer is only shared lockable if itsStop()
method hasn't been already called. Meaning, wheneverCheckerComponent::Stop()
is called, any thread that tries to acquire a shared lock on it inCheckable::ProcessCheckResult()
will ultimately fail and discard thecr
without ever processing it. On the other hand, if the checker was stopped after some threads have successfully acquired a shared lock on it (remember that when a thread acquires a shared lock, the producer automatically increments its counter), it will wait for all of them to finish while preventing any new threads from acquiring the lock until its counter goes back to0
.The only remaining problem that this PR doesn't solve is that
ApiListener::Stop()
implementation doesn't gracefully disconnect all active connections, which means that we might still lose some important check results received from remote endpoints. However, since this is a completely different issue, it will be addressed in a separate PR.Edit (yhabteab) end:
This is the very first and hardest step of the #10179 (comment) plan.
Checkable#ProcessCheckResult() now takes a CheckResultProducer::Ptr:
It tries to shared-lock the CheckResultProducer:
This implies that every component producing checks needs to implement CheckResultProducer:
Finally, the CheckResultProducer::Ptr has "just" to be passed through from the CheckResultProducer itself to Checkable#ProcessCheckResult():
Footnotes
The new
CheckResultProducer
interface satisfies thestd::SharedLockable
requirements. ↩