-
Notifications
You must be signed in to change notification settings - Fork 292
CA-409482: Using computed delay for RRD loop #6458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a289be7
to
35dcd13
Compare
LGTM! |
Can we updraft this? |
You mean 'undraft'? I'm making more testing for it. If it works well, I will undraft it. |
ocaml/xcp-rrdd/bin/rrdd/xcp_rrdd.ml
Outdated
@@ -537,17 +537,30 @@ let monitor_write_loop writers = | |||
Xenctrl.with_intf (fun xc -> | |||
while true do | |||
try | |||
let last_loop_start_time = Unix.gettimeofday () in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if an Mtime.counter
was used, but this needs a single counter, instead of one per iteration, and wait until the next period is reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_reading
also needs to be updated to use Mtime, it all currently relies on gettimeofday
which is prone to NTP jumps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counters probably wouldn't work well, because we need to calculate 2 different spans: begin to end, and current to last end, where end is a single point in time. That one should probably be an Mtime.t then, obtained via Mtime_clock.now()
and we can do arithmetic on Mtime.t (converting to spans if needed, the APIs aren't very pleasant to work with in this regard).
I think Mtime.t should represent a monotonic point in time, which is better than gettimeofday, but still "absolute" in some sense (well it'll be relative to machine start or process start, but not relative to other calls in the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counters probably wouldn't work well, because we need to calculate 2 different spans: begin to end, and current to last end, where end is a single point in time. That one should probably be an Mtime.t then
2 spans should be enough: the amount of time between the start and the last end, and the amount of time between the start and the current end. These can be subtracted to know how much time has passed between them
RRD loop is executed each 5 seconds. It delays fixed 5 seconds between each loop. But the loop self also consumes time (The time consuming depends on CPU's count. If there are many CPUs, the time consuming may be hundreds milliseconds). This implementation leads RRD will take an offset after several loops. Then one of RRD data lose and a gap can be observed on XenCenter performance graph. The solution is to use computed delay (timeslice - loop time consuming) instead of fixed delay. Signed-off-by: Bengang Yuan <[email protected]>
35dcd13
to
3bc9fae
Compare
let span = Mtime.span !last_loop_end_time (Mtime_clock.now ()) in | ||
!timeslice -. (Mtime.Span.to_float_ns span /. 1_000_000_000.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let span = Mtime.span !last_loop_end_time (Mtime_clock.now ()) in | |
!timeslice -. (Mtime.Span.to_float_ns span /. 1_000_000_000.) | |
let ( --- ) a b = Mtime.Span.abs_diff a b in | |
let time_in_iteration = (Mtime_clock.count from_loop_start) --- !last_iteration_end in | |
Some (!timeslice --- time_in_iteration) (** XXX Need to check that the time in the iteration is not greater than timeslice! *) | |
) | |
else | |
None |
@@ -24,7 +24,7 @@ let enable_all_dss = ref false | |||
let timeslice : float ref = ref 5. | |||
|
|||
(* Timestamp of the last monitoring loop end. *) | |||
let last_loop_end_time : float ref = ref neg_infinity | |||
let last_loop_end_time : Mtime.t ref = ref Mtime.min_stamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let last_loop_end_time : Mtime.t ref = ref Mtime.min_stamp | |
let last_iteration_end : Mtime.Span.t ref = ref Mtime.Span.zero |
while true do | ||
try | ||
let last_loop_start_time = Mtime_clock.now () in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while true do | |
try | |
let last_loop_start_time = Mtime_clock.now () in | |
let from_loop_start = Mtime_clock.counter () in | |
while true do | |
try | |
let iteration_start = Mtime_clock.count from_loop_start in |
do_monitor_write xc writers ; | ||
with_lock Rrdd_shared.last_loop_end_time_m (fun _ -> | ||
Rrdd_shared.last_loop_end_time := Unix.gettimeofday () | ||
Rrdd_shared.last_loop_end_time := Mtime_clock.now () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rrdd_shared.last_loop_end_time := Mtime_clock.now () | |
Rrdd_shared.last_iteration_end := Mtime_clock.count from_loop_start |
let span = | ||
Mtime.span last_loop_start_time !Rrdd_shared.last_loop_end_time | ||
in | ||
let duration_in_second = | ||
Mtime.Span.to_float_ns span /. 1_000_000_000. | ||
in | ||
let delay = !Rrdd_shared.timeslice -. duration_in_second in | ||
(* Using computed delay (timeslice - loop time consuming) instead | ||
of fixed delay*) | ||
if delay > 0.0 then | ||
Thread.delay delay | ||
else | ||
warn | ||
"%s: Monitor write loop took so long time that the delay \ | ||
(%f) is less than 0, so skip the delay" | ||
__FUNCTION__ delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let span = | |
Mtime.span last_loop_start_time !Rrdd_shared.last_loop_end_time | |
in | |
let duration_in_second = | |
Mtime.Span.to_float_ns span /. 1_000_000_000. | |
in | |
let delay = !Rrdd_shared.timeslice -. duration_in_second in | |
(* Using computed delay (timeslice - loop time consuming) instead | |
of fixed delay*) | |
if delay > 0.0 then | |
Thread.delay delay | |
else | |
warn | |
"%s: Monitor write loop took so long time that the delay \ | |
(%f) is less than 0, so skip the delay" | |
__FUNCTION__ delay | |
let ( --- ) = Mtime.Span.abs_diff in | |
let iteration = !Rrdd_shared.last_iteration_end --- iteration_start in | |
if Mtime.Span.is_longer ~than:!Rrdd_shared.timeslice iteration then | |
warn | |
"%s: Monitor write iteration took (%a), this is longer than a full \ | |
cycle, skipping the delay" | |
__FUNCTION__ Debug.Pp.span iteration | |
else | |
let delay = !Rrdd_shared.timeslice --- iteration in | |
Thread.delay (Clock.Timer.span_to_s delay) |
RRD loop is executed each 5 seconds. It delays fixed 5 seconds between each loop. But the loop self also consumes time (The time consuming depends on CPU's count. If there are many CPUs, the time consuming may be hundreds milliseconds). This implementation leads RRD will take an offset after several loops. Then one of RRD data lose and a gap can be observed on XenCenter performance graph.
The solution is to use computed delay (timeslice - loop time consuming) instead of fixed delay.