-
Notifications
You must be signed in to change notification settings - Fork 10
clean up assert for machine_implc.cc #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #327 +/- ##
==========================================
- Coverage 27.15% 27.15% -0.01%
==========================================
Files 190 190
Lines 39174 39178 +4
Branches 14289 14196 -93
==========================================
- Hits 10638 10637 -1
+ Misses 27681 27356 -325
- Partials 855 1185 +330 ☔ View full report in Codecov by Sentry. |
src/realm/compiler_support.h
Outdated
#endif | ||
|
||
// TODO: remove the logger if used in gpu kernels | ||
#define REALM_ASSERT_WITH_ABORT(expr, logger) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this differently? Something like REALM_BUG_ON
or something (following the linux kernel naming for something similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @muraj but I don't like REALM_BUG_on either. Perhaps...REALM_ABORT_ON?
log_machine.info() << "ProcInfo: adding proc-mem affinity " << pma.p << " -> " | ||
<< pma.m << " with bandwidth " << pma.bandwidth << " and latency " | ||
<< pma.latency; | ||
assert(pma.bandwidth > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't be removing these. Assert is perfectly fine to use as documentation and debugging, but it should not be used on code that is expected to run in release (i.e. when asserts are turned off).
A much easier way to manage what we need is to do something like the following instead (this is what CUDA does):
#ifdef NDEBUG
#define REALM_ASSERT(x) if(!(x)) { log_runtime.fatal("%s(%d): failed assert '%s'", __FILE__, __LINE__, #x); abort(); }
#else
#define REALM_ASSERT(x) assert(x)
#endif
Then replace all uses of assert() with REALM_ASSERT(). We can then run through at a later time and clean things up as we go through to ensure the usages don't have side effects, therefore we can get rid of the extra logging stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planing to use something like your proposed REALM_ASSERT
, but then assert can not be optimized even when NDEBUG is enabled. I am wondering how to decide whether an assert
can be safely optimized/ignored in release mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're currently in a situation where we're always building with NDEBUG off. So it's not a regression to do this. We can always optimize later, let's just fix the original issue in a way we can control first, then optimize later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't remove any asserts that are unrelated to assert(0). Can we just focus on getting the assert(0) replaced with aborts. All other asserts need to more carefully reviewed and I don't want to waste time on going over everything in a single PR. That will take longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remove any asserts. I just moved it to the upper caller for a better coverage.
Can we just focus on getting the assert(0) replaced with aborts.
The direct reason to clean up the assert is because when the NDEBUG is added (We explicitly removed the NDBUG from the cmake, but it could be added by others), realm could crash, and I have seen it before with the legete conda env on the aarch64. Anyway, I think I will just try Cory's suggestion mentioned above.
src/realm/machine_impl.cc
Outdated
bool MachineNodeInfo::add_processor(Processor p) | ||
{ | ||
assert(node == NodeID(ID(p).proc_owner_node())); | ||
REALM_ASSERT_WITH_ABORT(node == NodeID(ID(p).proc_owner_node()), log_machine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddy16112 Can you please undo all the changes here. There are lot of assert(0) across the codebase. I suggest let's handle those first please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced them with REALM_ASSERT
I proposed four ways to clean up four different types of assert.
e. g.
replace it into
replace it with abort;
e.g.
assert inside the
instead of assert, we can let the function returns an error code and let the caller to handle it.
4. other assert
Replace them with
REALM_ASSERT_WITH_ABORT()
, a macro that print the error and abort.