Skip to content

CA-409482: Using computed delay for RRD loop #6458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BengangY
Copy link
Contributor

@BengangY BengangY commented May 8, 2025

RRD loop is executed each 5 seconds. It delays fixed 5 seconds between each loop. But the loop self also consumes time (The time consuming depends on CPU's count. If there are many CPUs, the time consuming may be hundreds milliseconds). This implementation leads RRD will take an offset after several loops. Then one of RRD data lose and a gap can be observed on XenCenter performance graph.

The solution is to use computed delay (timeslice - loop time consuming) instead of fixed delay.

@BengangY BengangY force-pushed the private/bengangy/CA-409482 branch from a289be7 to 35dcd13 Compare May 8, 2025 09:49
@robhoes robhoes requested a review from last-genius May 8, 2025 09:50
@last-genius
Copy link
Contributor

LGTM!

@lindig
Copy link
Contributor

lindig commented May 9, 2025

Can we updraft this?

@BengangY
Copy link
Contributor Author

BengangY commented May 9, 2025

Can we updraft this?

You mean 'undraft'? I'm making more testing for it. If it works well, I will undraft it.

@BengangY BengangY marked this pull request as ready for review May 12, 2025 02:41
@@ -537,17 +537,30 @@ let monitor_write_loop writers =
Xenctrl.with_intf (fun xc ->
while true do
try
let last_loop_start_time = Unix.gettimeofday () in
Copy link
Member

@psafont psafont May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if an Mtime.counter was used, but this needs a single counter, instead of one per iteration, and wait until the next period is reached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_reading also needs to be updated to use Mtime, it all currently relies on gettimeofday which is prone to NTP jumps.

Copy link
Contributor

@edwintorok edwintorok May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counters probably wouldn't work well, because we need to calculate 2 different spans: begin to end, and current to last end, where end is a single point in time. That one should probably be an Mtime.t then, obtained via Mtime_clock.now() and we can do arithmetic on Mtime.t (converting to spans if needed, the APIs aren't very pleasant to work with in this regard).

I think Mtime.t should represent a monotonic point in time, which is better than gettimeofday, but still "absolute" in some sense (well it'll be relative to machine start or process start, but not relative to other calls in the code)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counters probably wouldn't work well, because we need to calculate 2 different spans: begin to end, and current to last end, where end is a single point in time. That one should probably be an Mtime.t then

2 spans should be enough: the amount of time between the start and the last end, and the amount of time between the start and the current end. These can be subtracted to know how much time has passed between them

RRD loop is executed each 5 seconds. It delays fixed 5 seconds between each
loop. But the loop self also consumes time (The time consuming depends on CPU's
count. If there are many CPUs, the time consuming may be hundreds milliseconds).
This implementation leads RRD will take an offset after several loops. Then one
of RRD data lose and a gap can be observed on XenCenter performance graph.

The solution is to use computed delay (timeslice - loop time consuming) instead
of fixed delay.

Signed-off-by: Bengang Yuan <[email protected]>
@BengangY BengangY force-pushed the private/bengangy/CA-409482 branch from 35dcd13 to 3bc9fae Compare May 14, 2025 08:28
Comment on lines +720 to +721
let span = Mtime.span !last_loop_end_time (Mtime_clock.now ()) in
!timeslice -. (Mtime.Span.to_float_ns span /. 1_000_000_000.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let span = Mtime.span !last_loop_end_time (Mtime_clock.now ()) in
!timeslice -. (Mtime.Span.to_float_ns span /. 1_000_000_000.)
let ( --- ) a b = Mtime.Span.abs_diff a b in
let time_in_iteration = (Mtime_clock.count from_loop_start) --- !last_iteration_end in
Some (!timeslice --- time_in_iteration) (** XXX Need to check that the time in the iteration is not greater than timeslice! *)
)
else
None

@@ -24,7 +24,7 @@ let enable_all_dss = ref false
let timeslice : float ref = ref 5.

(* Timestamp of the last monitoring loop end. *)
let last_loop_end_time : float ref = ref neg_infinity
let last_loop_end_time : Mtime.t ref = ref Mtime.min_stamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let last_loop_end_time : Mtime.t ref = ref Mtime.min_stamp
let last_iteration_end : Mtime.Span.t ref = ref Mtime.Span.zero

Comment on lines 538 to +540
while true do
try
let last_loop_start_time = Mtime_clock.now () in
Copy link
Member

@psafont psafont May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while true do
try
let last_loop_start_time = Mtime_clock.now () in
let from_loop_start = Mtime_clock.counter () in
while true do
try
let iteration_start = Mtime_clock.count from_loop_start in

do_monitor_write xc writers ;
with_lock Rrdd_shared.last_loop_end_time_m (fun _ ->
Rrdd_shared.last_loop_end_time := Unix.gettimeofday ()
Rrdd_shared.last_loop_end_time := Mtime_clock.now ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Rrdd_shared.last_loop_end_time := Mtime_clock.now ()
Rrdd_shared.last_iteration_end := Mtime_clock.count from_loop_start

Comment on lines +545 to +560
let span =
Mtime.span last_loop_start_time !Rrdd_shared.last_loop_end_time
in
let duration_in_second =
Mtime.Span.to_float_ns span /. 1_000_000_000.
in
let delay = !Rrdd_shared.timeslice -. duration_in_second in
(* Using computed delay (timeslice - loop time consuming) instead
of fixed delay*)
if delay > 0.0 then
Thread.delay delay
else
warn
"%s: Monitor write loop took so long time that the delay \
(%f) is less than 0, so skip the delay"
__FUNCTION__ delay
Copy link
Member

@psafont psafont May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let span =
Mtime.span last_loop_start_time !Rrdd_shared.last_loop_end_time
in
let duration_in_second =
Mtime.Span.to_float_ns span /. 1_000_000_000.
in
let delay = !Rrdd_shared.timeslice -. duration_in_second in
(* Using computed delay (timeslice - loop time consuming) instead
of fixed delay*)
if delay > 0.0 then
Thread.delay delay
else
warn
"%s: Monitor write loop took so long time that the delay \
(%f) is less than 0, so skip the delay"
__FUNCTION__ delay
let ( --- ) = Mtime.Span.abs_diff in
let iteration = !Rrdd_shared.last_iteration_end --- iteration_start in
if Mtime.Span.is_longer ~than:!Rrdd_shared.timeslice iteration then
warn
"%s: Monitor write iteration took (%a), this is longer than a full \
cycle, skipping the delay"
__FUNCTION__ Debug.Pp.span iteration
else
let delay = !Rrdd_shared.timeslice --- iteration in
Thread.delay (Clock.Timer.span_to_s delay)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants