-
Notifications
You must be signed in to change notification settings - Fork 579
i#3995 irregular window tracing: last window bug fix #7448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When a workload executed more than `DELAY_FOREVER_THRESHOLD` (= 1024^3) instructions after the last window specified in `-trace_instr_intervals_file`, the tracing of a new, infinite "trace" window triggered. This happened because the last "no trace" window that we create (not the user), was set to `DELAY_FOREVER_THRESHOLD` instead of `0`. This last window exists to stop the tracing after the last, user-defined window set in `-trace_instr_intervals_file` (reminder: if the user wants to trace the end of the execution, the last window in the file has to be large enough to cover the end of the workload execution). Although a bit counter-intuitive, a "no trace" window of `0` (in terms of instructions), actually means: don't start a new tracing window. Tested manually, as a new binary that executes at least `DELAY_FOREVER_THRESHOLD` instructions takes way more than 90 seconds to run a tool like `basic_counts`, which we normally use to count windows (by looking at its output). The reason `code_api|tool.drcachesim.irregular-windows-simple` passed is that the test binary traced (`simple_app`) executes way less than DELAY_FOREVER_THRESHOLD instructions (~100k instructions), so the last "no trace" window was large enough to not trigger a new, infinite "trace" window. Issue #3995
Apologies, I missed this before. Reviewing it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit confused by some existing windows related logic. Would help to add more comments in code by the window-related functions.
@@ -550,9 +550,9 @@ compute_irregular_trace_windows(std::vector<instr_interval_t> &instr_intervals) | |||
// must have a duration long enough to cover the end of the program. | |||
irregular_window_ptr = | |||
(irregular_window_t *)dr_global_alloc(sizeof(irregular_window_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other uses of DELAY_FOREVER_THRESHOLD? I see instr_count_threshold() still returns that when current_no_trace_for_instrs == 0
(which IIUC will be true after the last window with this PR's change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that instr_count_threshold()
is used to differentiate between counting modes.
If it returns > DELAY_EXACT_THRESHOLD
(like when current_no_trace_for_instrs == 0
, which causes it to return DELAY_FOREVER_THRESHOLD
), then the tracer adds to the global instruction count instr_count
every DELAY_COUNTDOWN_UNIT
.
Otherwise, when instr_count_threshold()
returns a small "no_trace" window (like in the case of a "proper", user-defined window), then the tracer adds directly to the global instruction count.
However, what actually blocks the creation of the unwanted window is has_instr_count_threshold_to_enable_tracing()
, which returns false
when current_no_trace_for_instrs == 0
.
Hence we return here: https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L190, before instr_count_threshold()
or the creation of a new tracing window below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it returns > DELAY_EXACT_THRESHOLD (like when current_no_trace_for_instrs == 0, which causes it to return DELAY_FOREVER_THRESHOLD),
I didn't understand this statement. In your example does it return DELAY_EXACT_THRESHOLD
or >DELAY_EXACT_THRESHOLD
?
Even if other logic prevents its use, it seems weird to leave DELAY_FOREVER_THRESHOLD
in there: it sounds like it still signifies "trace forever" for some cases, which I thought we don't want anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this statement. In your example does it return
DELAY_EXACT_THRESHOLD
or>DELAY_EXACT_THRESHOLD
?
The previous behavior (which caused the bug), would have instr_count_threshold()
return DELAY_FOREVER_THRESHOLD
(which is > DELAY_EXACT_THRESHOLD
) for the last "no trace" window.
So, we'd enter the if statement at https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L196 and not start a tracing window (like we expect) only until the program executed at least DELAY_FOREVER_THRESHOLD
instructions, which is not that much, it's "only" 1024^3 (~1 billion instructions).
So, despite the name DELAY_FOREVER_THRESHOLD
, we wouldn't delay tracing forever, we would delay only for ~1 billion instructions and then start tracing again.
I got confused by the name as well, but DELAY_FOREVER_THRESHOLD
should have never been used as a value for a "no trace" window in the first place.
Looking at the comments of DELAY_EXACT_THRESHOLD
and DELAY_FOREVER_THRESHOLD
, these are used for counting modes: local vs global (https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L77).
Even if other logic prevents its use, it seems weird to leave DELAY_FOREVER_THRESHOLD in there: it sounds like it still signifies "trace forever" for some cases, which I thought we don't want anymore.
Is the suggestion to change the name of this constant because misleading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confused by the name as well, but DELAY_FOREVER_THRESHOLD should have never been used as a value for a "no trace" window in the first place.
Looking at the comments of DELAY_EXACT_THRESHOLD and DELAY_FOREVER_THRESHOLD, these are used for counting modes: local vs global (https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L77).
Is the suggestion to change the name of this constant because misleading?
Indeed the name is confusing. Renaming would help.
irregular_window_ptr->no_trace_for_instrs = DELAY_FOREVER_THRESHOLD; | ||
// This results in has_instr_count_threshold_to_enable_tracing() returning 0, which | ||
// in turns makes hit_instr_count_threshold() return without creating a new window. | ||
irregular_window_ptr->no_trace_for_instrs = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be any simpler to use UINT64_MAX? Or would that make instrumentation more difficult as loading that into a register takes more instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, that's definitely a large enough value.
I don't think loading UINT64_MAX into a register would take more instructions (in general).
Do you mean that part of the instrumentation the tracer adds involves adding instructions to load the value of a "no_trace" window into a register and that might be more instructions than what we would have for smaller values? (I am not sure about this, but it would be DR dependent, so "fixable").
My view is: it's nice to have a value that indicates an infinite "no_trace" window (like 0 is today), rather than a very large value like UINT64_MAX.
However, I do agree that 0 is a bit misleading and can be interpreted in the opposite way as: the "no_trace" window is null, start a new "trace" window immediately.
Also, in the future, we might want to use the window tracing mechanism to shard long traces, so in that case we would like to have a null "no_trace" window to avoid gaps in between.
Maybe we can revisit this when (or if) we decide to use window tracing as additional sharding method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See instrlist_insert_mov_immed_ptrsz which may use a different number of instructions on aarch64 depending on the size of the imm.
However, I do agree that 0 is a bit misleading and can be interpreted in the opposite way as: the "no_trace" window is null, start a new "trace" window immediately.
I agree that it is misleading. Indeed even today, the very first window may have a start value of zero (signifying immediate start of tracing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See instrlist_insert_mov_immed_ptrsz which may use a different number of instructions on aarch64 depending on the size of the imm.
Got it, thanks!
I agree that it is misleading.
Using UINT64_MAX
. It's a performance penalty, but it's only for one value, one time (last window). Clarity in the code is more beneficial IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a performance penalty, but it's only for one value, one time (last window). Clarity in the code is more beneficial IMHO.
Oh so that value is not loaded for some repeated inlined comparison in the instrumented block? Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the value loaded in the register comes from instr_count_threshold()
, which depends on what window we're at. Only the last no-trace window will have UINT64_MAX. (xref: https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L331).
Note that the user can still purposely make a no-trace window large enough to trigger a more-than-one-instruction move to a register, but there is nothing we can do about that.
Add NOTIFY messages about changes in the intervals (merged intervals).
irregular_window_ptr->no_trace_for_instrs = DELAY_FOREVER_THRESHOLD; | ||
// This results in has_instr_count_threshold_to_enable_tracing() returning 0, which | ||
// in turns makes hit_instr_count_threshold() return without creating a new window. | ||
irregular_window_ptr->no_trace_for_instrs = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See instrlist_insert_mov_immed_ptrsz which may use a different number of instructions on aarch64 depending on the size of the imm.
However, I do agree that 0 is a bit misleading and can be interpreted in the opposite way as: the "no_trace" window is null, start a new "trace" window immediately.
I agree that it is misleading. Indeed even today, the very first window may have a start value of zero (signifying immediate start of tracing).
@@ -550,9 +550,9 @@ compute_irregular_trace_windows(std::vector<instr_interval_t> &instr_intervals) | |||
// must have a duration long enough to cover the end of the program. | |||
irregular_window_ptr = | |||
(irregular_window_t *)dr_global_alloc(sizeof(irregular_window_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it returns > DELAY_EXACT_THRESHOLD (like when current_no_trace_for_instrs == 0, which causes it to return DELAY_FOREVER_THRESHOLD),
I didn't understand this statement. In your example does it return DELAY_EXACT_THRESHOLD
or >DELAY_EXACT_THRESHOLD
?
Even if other logic prevents its use, it seems weird to leave DELAY_FOREVER_THRESHOLD
in there: it sounds like it still signifies "trace forever" for some cases, which I thought we don't want anymore.
"Instruction interval starting at %" PRIu64 " has duration of 0. " | ||
"Removing interval.\n", | ||
start); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main changes from the last review are:
- Using
UINT64_MAX
instead of0
for the last "no trace" window we add. - Added a
NOTIFY
message if the user-defined instruction intervals contain intervals of duration == 0 (we skip it). - Added a
NOTIFY
message when we merge overlapping intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR description.
uint64 end = interval.start + interval.duration; | ||
instr_interval_t &last_interval = instr_intervals_merged.back(); | ||
uint64 last_end = last_interval.start + last_interval.duration; | ||
if (interval.start <= last_end) { | ||
uint64 max_end = last_end > end ? last_end : end; | ||
last_interval.duration = max_end - last_interval.start; | ||
NOTIFY(0, | ||
"Instruction interval starting at %" PRIu64 " has been merged with " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this log more concise by denoting intervals as [start, end]
.
"Instruction interval starting at %" PRIu64 " has duration of 0. " | ||
"Removing interval.\n", | ||
start); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR description.
irregular_window_ptr->no_trace_for_instrs = DELAY_FOREVER_THRESHOLD; | ||
// This results in has_instr_count_threshold_to_enable_tracing() returning 0, which | ||
// in turns makes hit_instr_count_threshold() return without creating a new window. | ||
irregular_window_ptr->no_trace_for_instrs = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a performance penalty, but it's only for one value, one time (last window). Clarity in the code is more beneficial IMHO.
Oh so that value is not loaded for some repeated inlined comparison in the instrumented block? Ok.
@@ -550,9 +550,9 @@ compute_irregular_trace_windows(std::vector<instr_interval_t> &instr_intervals) | |||
// must have a duration long enough to cover the end of the program. | |||
irregular_window_ptr = | |||
(irregular_window_t *)dr_global_alloc(sizeof(irregular_window_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confused by the name as well, but DELAY_FOREVER_THRESHOLD should have never been used as a value for a "no trace" window in the first place.
Looking at the comments of DELAY_EXACT_THRESHOLD and DELAY_FOREVER_THRESHOLD, these are used for counting modes: local vs global (https://github.com/DynamoRIO/dynamorio/blob/master/clients/drcachesim/tracer/instr_counter.cpp#L77).
Is the suggestion to change the name of this constant because misleading?
Indeed the name is confusing. Renaming would help.
@@ -41,6 +41,7 @@ | |||
#include <stddef.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before submitting, maybe we should test this on a large app to make sure everything is WAI.
When a workload executed more than
DELAY_FOREVER_THRESHOLD
(= 1024^3) instructions after the last window specified in
-trace_instr_intervals_file
, the tracing of an unwanted, infinite "trace"window triggered.
This happened because the last "no trace" window that we create
(not the user), was set to
DELAY_FOREVER_THRESHOLD
insteadof the larger value
UINT64_MAX
, which we assume is a large enoughvalue to stop tracing for the remaining of the execution of the traced
program. This last window exists to stop the tracing after the last,
user-defined window set in
-trace_instr_intervals_file
(reminder:if the user wants to trace the end of the execution, the last window
in the file has to be large enough to cover the end of the workload
execution).
Tested manually, as a new binary that executes at least
DELAY_FOREVER_THRESHOLD
instructions takes way more than90 seconds to run a tool like
basic_counts
, which we normally useto count windows (by looking at its output).
The reason
code_api|tool.drcachesim.irregular-windows-simple
passed is that the test binary traced (
simple_app
) executes~100k instructions, which is way less than
DELAY_FOREVER_THRESHOLD
instructions, so the last "no trace"window of
DELAY_FOREVER_THRESHOLD
was large enough tonot trigger a new, infinite "trace" window.
Adds a check for user-defined instruction intervals with a
duration of 0. They are meaningless, so we ignore them and
print a
NOTIFY
message letting the user know.Adds a
NOTIFY
message when user-defined instruction intervalsoverlap (in which case, they are merged).
Issue #3995