Skip to content

i#3995 irregular window tracing: last window bug fix #7448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

edeiana
Copy link
Contributor

@edeiana edeiana commented Apr 9, 2025

When a workload executed more than DELAY_FOREVER_THRESHOLD
(= 1024^3) instructions after the last window specified in
-trace_instr_intervals_file, the tracing of an unwanted, infinite "trace"
window triggered.

This happened because the last "no trace" window that we create
(not the user), was set to DELAY_FOREVER_THRESHOLD instead
of the larger value UINT64_MAX, which we assume is a large enough
value to stop tracing for the remaining of the execution of the traced
program. This last window exists to stop the tracing after the last,
user-defined window set in -trace_instr_intervals_file (reminder:
if the user wants to trace the end of the execution, the last window
in the file has to be large enough to cover the end of the workload
execution).

Tested manually, as a new binary that executes at least
DELAY_FOREVER_THRESHOLD instructions takes way more than
90 seconds to run a tool like basic_counts, which we normally use
to count windows (by looking at its output).

The reason code_api|tool.drcachesim.irregular-windows-simple
passed is that the test binary traced (simple_app) executes
~100k instructions, which is way less than
DELAY_FOREVER_THRESHOLD instructions, so the last "no trace"
window of DELAY_FOREVER_THRESHOLD was large enough to
not trigger a new, infinite "trace" window.

Adds a check for user-defined instruction intervals with a
duration of 0. They are meaningless, so we ignore them and
print a NOTIFY message letting the user know.

Adds a NOTIFY message when user-defined instruction intervals
overlap (in which case, they are merged).

Issue #3995

When a workload executed more than `DELAY_FOREVER_THRESHOLD`
(= 1024^3) instructions after the last window specified in
`-trace_instr_intervals_file`, the tracing of a new, infinite
"trace" window triggered.

This happened because the last "no trace" window that we create
(not the user), was set to `DELAY_FOREVER_THRESHOLD` instead
of `0`. This last window exists to stop the tracing after the
last, user-defined window set in `-trace_instr_intervals_file`
(reminder: if the user wants to trace the end of the execution,
the last window in the file has to be large enough to cover the
end of the workload execution).

Although a bit counter-intuitive, a "no trace" window of `0`
(in terms of instructions), actually means: don't start a
new tracing window.

Tested manually, as a new binary that executes at least
`DELAY_FOREVER_THRESHOLD` instructions takes way more than 90
seconds to run a tool like `basic_counts`, which we normally
use to count windows (by looking at its output).

The reason `code_api|tool.drcachesim.irregular-windows-simple`
passed is that the test binary traced (`simple_app`) executes
way less than DELAY_FOREVER_THRESHOLD instructions
(~100k instructions), so the last "no trace" window was large
enough to not trigger a new, infinite "trace" window.

Issue #3995
@edeiana edeiana requested a review from abhinav92003 April 9, 2025 07:29
@abhinav92003
Copy link
Contributor

Apologies, I missed this before. Reviewing it now.

Copy link
Contributor

@abhinav92003 abhinav92003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused by some existing windows related logic. Would help to add more comments in code by the window-related functions.

@@ -550,9 +550,9 @@ compute_irregular_trace_windows(std::vector<instr_interval_t> &instr_intervals)
// must have a duration long enough to cover the end of the program.
irregular_window_ptr =
(irregular_window_t *)dr_global_alloc(sizeof(irregular_window_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other uses of DELAY_FOREVER_THRESHOLD? I see instr_count_threshold() still returns that when current_no_trace_for_instrs == 0 (which IIUC will be true after the last window with this PR's change).

Copy link
Contributor Author

@edeiana edeiana Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that instr_count_threshold() is used to differentiate between counting modes.
If it returns > DELAY_EXACT_THRESHOLD (like when current_no_trace_for_instrs == 0, which causes it to return DELAY_FOREVER_THRESHOLD), then the tracer adds to the global instruction count instr_count every DELAY_COUNTDOWN_UNIT.
Otherwise, when instr_count_threshold() returns a small "no_trace" window (like in the case of a "proper", user-defined window), then the tracer adds directly to the global instruction count.

However, what actually blocks the creation of the unwanted window is has_instr_count_threshold_to_enable_tracing(), which returns false when current_no_trace_for_instrs == 0.
Hence we return here: https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L190, before instr_count_threshold() or the creation of a new tracing window below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it returns > DELAY_EXACT_THRESHOLD (like when current_no_trace_for_instrs == 0, which causes it to return DELAY_FOREVER_THRESHOLD),

I didn't understand this statement. In your example does it return DELAY_EXACT_THRESHOLD or >DELAY_EXACT_THRESHOLD?

Even if other logic prevents its use, it seems weird to leave DELAY_FOREVER_THRESHOLD in there: it sounds like it still signifies "trace forever" for some cases, which I thought we don't want anymore.

Copy link
Contributor Author

@edeiana edeiana May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this statement. In your example does it return DELAY_EXACT_THRESHOLD or >DELAY_EXACT_THRESHOLD?

The previous behavior (which caused the bug), would have instr_count_threshold() return DELAY_FOREVER_THRESHOLD (which is > DELAY_EXACT_THRESHOLD) for the last "no trace" window.
So, we'd enter the if statement at https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L196 and not start a tracing window (like we expect) only until the program executed at least DELAY_FOREVER_THRESHOLD instructions, which is not that much, it's "only" 1024^3 (~1 billion instructions).
So, despite the name DELAY_FOREVER_THRESHOLD, we wouldn't delay tracing forever, we would delay only for ~1 billion instructions and then start tracing again.
I got confused by the name as well, but DELAY_FOREVER_THRESHOLD should have never been used as a value for a "no trace" window in the first place.
Looking at the comments of DELAY_EXACT_THRESHOLD and DELAY_FOREVER_THRESHOLD, these are used for counting modes: local vs global (https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L77).

Even if other logic prevents its use, it seems weird to leave DELAY_FOREVER_THRESHOLD in there: it sounds like it still signifies "trace forever" for some cases, which I thought we don't want anymore.

Is the suggestion to change the name of this constant because misleading?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused by the name as well, but DELAY_FOREVER_THRESHOLD should have never been used as a value for a "no trace" window in the first place.
Looking at the comments of DELAY_EXACT_THRESHOLD and DELAY_FOREVER_THRESHOLD, these are used for counting modes: local vs global (https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L77).
Is the suggestion to change the name of this constant because misleading?

Indeed the name is confusing. Renaming would help.

irregular_window_ptr->no_trace_for_instrs = DELAY_FOREVER_THRESHOLD;
// This results in has_instr_count_threshold_to_enable_tracing() returning 0, which
// in turns makes hit_instr_count_threshold() return without creating a new window.
irregular_window_ptr->no_trace_for_instrs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be any simpler to use UINT64_MAX? Or would that make instrumentation more difficult as loading that into a register takes more instructions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, that's definitely a large enough value.
I don't think loading UINT64_MAX into a register would take more instructions (in general).
Do you mean that part of the instrumentation the tracer adds involves adding instructions to load the value of a "no_trace" window into a register and that might be more instructions than what we would have for smaller values? (I am not sure about this, but it would be DR dependent, so "fixable").

My view is: it's nice to have a value that indicates an infinite "no_trace" window (like 0 is today), rather than a very large value like UINT64_MAX.
However, I do agree that 0 is a bit misleading and can be interpreted in the opposite way as: the "no_trace" window is null, start a new "trace" window immediately.
Also, in the future, we might want to use the window tracing mechanism to shard long traces, so in that case we would like to have a null "no_trace" window to avoid gaps in between.
Maybe we can revisit this when (or if) we decide to use window tracing as additional sharding method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See instrlist_insert_mov_immed_ptrsz which may use a different number of instructions on aarch64 depending on the size of the imm.

However, I do agree that 0 is a bit misleading and can be interpreted in the opposite way as: the "no_trace" window is null, start a new "trace" window immediately.

I agree that it is misleading. Indeed even today, the very first window may have a start value of zero (signifying immediate start of tracing).

Copy link
Contributor Author

@edeiana edeiana May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See instrlist_insert_mov_immed_ptrsz which may use a different number of instructions on aarch64 depending on the size of the imm.

Got it, thanks!

I agree that it is misleading.

Using UINT64_MAX. It's a performance penalty, but it's only for one value, one time (last window). Clarity in the code is more beneficial IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a performance penalty, but it's only for one value, one time (last window). Clarity in the code is more beneficial IMHO.

Oh so that value is not loaded for some repeated inlined comparison in the instrumented block? Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the value loaded in the register comes from instr_count_threshold(), which depends on what window we're at. Only the last no-trace window will have UINT64_MAX. (xref: https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L331).

Note that the user can still purposely make a no-trace window large enough to trigger a more-than-one-instruction move to a register, but there is nothing we can do about that.

irregular_window_ptr->no_trace_for_instrs = DELAY_FOREVER_THRESHOLD;
// This results in has_instr_count_threshold_to_enable_tracing() returning 0, which
// in turns makes hit_instr_count_threshold() return without creating a new window.
irregular_window_ptr->no_trace_for_instrs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See instrlist_insert_mov_immed_ptrsz which may use a different number of instructions on aarch64 depending on the size of the imm.

However, I do agree that 0 is a bit misleading and can be interpreted in the opposite way as: the "no_trace" window is null, start a new "trace" window immediately.

I agree that it is misleading. Indeed even today, the very first window may have a start value of zero (signifying immediate start of tracing).

@@ -550,9 +550,9 @@ compute_irregular_trace_windows(std::vector<instr_interval_t> &instr_intervals)
// must have a duration long enough to cover the end of the program.
irregular_window_ptr =
(irregular_window_t *)dr_global_alloc(sizeof(irregular_window_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it returns > DELAY_EXACT_THRESHOLD (like when current_no_trace_for_instrs == 0, which causes it to return DELAY_FOREVER_THRESHOLD),

I didn't understand this statement. In your example does it return DELAY_EXACT_THRESHOLD or >DELAY_EXACT_THRESHOLD?

Even if other logic prevents its use, it seems weird to leave DELAY_FOREVER_THRESHOLD in there: it sounds like it still signifies "trace forever" for some cases, which I thought we don't want anymore.

"Instruction interval starting at %" PRIu64 " has duration of 0. "
"Removing interval.\n",
start);
continue;
Copy link
Contributor Author

@edeiana edeiana May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main changes from the last review are:

  1. Using UINT64_MAX instead of 0 for the last "no trace" window we add.
  2. Added a NOTIFY message if the user-defined instruction intervals contain intervals of duration == 0 (we skip it).
  3. Added a NOTIFY message when we merge overlapping intervals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description.

@edeiana edeiana requested a review from abhinav92003 May 2, 2025 22:51
uint64 end = interval.start + interval.duration;
instr_interval_t &last_interval = instr_intervals_merged.back();
uint64 last_end = last_interval.start + last_interval.duration;
if (interval.start <= last_end) {
uint64 max_end = last_end > end ? last_end : end;
last_interval.duration = max_end - last_interval.start;
NOTIFY(0,
"Instruction interval starting at %" PRIu64 " has been merged with "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this log more concise by denoting intervals as [start, end].

"Instruction interval starting at %" PRIu64 " has duration of 0. "
"Removing interval.\n",
start);
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description.

irregular_window_ptr->no_trace_for_instrs = DELAY_FOREVER_THRESHOLD;
// This results in has_instr_count_threshold_to_enable_tracing() returning 0, which
// in turns makes hit_instr_count_threshold() return without creating a new window.
irregular_window_ptr->no_trace_for_instrs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a performance penalty, but it's only for one value, one time (last window). Clarity in the code is more beneficial IMHO.

Oh so that value is not loaded for some repeated inlined comparison in the instrumented block? Ok.

@@ -550,9 +550,9 @@ compute_irregular_trace_windows(std::vector<instr_interval_t> &instr_intervals)
// must have a duration long enough to cover the end of the program.
irregular_window_ptr =
(irregular_window_t *)dr_global_alloc(sizeof(irregular_window_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused by the name as well, but DELAY_FOREVER_THRESHOLD should have never been used as a value for a "no trace" window in the first place.
Looking at the comments of DELAY_EXACT_THRESHOLD and DELAY_FOREVER_THRESHOLD, these are used for counting modes: local vs global (https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L77).
Is the suggestion to change the name of this constant because misleading?

Indeed the name is confusing. Renaming would help.

@@ -41,6 +41,7 @@
#include <stddef.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before submitting, maybe we should test this on a large app to make sure everything is WAI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants