Skip to content

Conversation

eddy16112
Copy link
Contributor

@eddy16112 eddy16112 commented Oct 4, 2025

I proposed four ways to clean up four different types of assert.

  1. The assert contains a function call that can not be removed if NDEBUG is enabled
    e. g.
assert(ptr->add_process_info(process_info));

replace it into

if (!ptr->add_process_info(process_info)) {abort with some error messages;}
  1. assert(0) or assert(false);
    replace it with abort;
  2. assert that can be handled with error code by callers. This is usually for functions that will be exposed to users.
    e.g.
    assert inside the
MachineImpl::get_local_processors_by_kind(std::set<Processor> &pset, Processor::Kind kind) const

instead of assert, we can let the function returns an error code and let the caller to handle it.
4. other assert
Replace them with REALM_ASSERT_WITH_ABORT(), a macro that print the error and abort.

Copy link

codecov bot commented Oct 4, 2025

Codecov Report

❌ Patch coverage is 6.52174% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.15%. Comparing base (ade8156) to head (fbd4005).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/realm/machine_impl.cc 2.43% 28 Missing and 12 partials ⚠️
src/realm/sparsity.inl 50.00% 2 Missing ⚠️
src/realm/indexspace.inl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #327      +/-   ##
==========================================
- Coverage   27.15%   27.15%   -0.01%     
==========================================
  Files         190      190              
  Lines       39174    39178       +4     
  Branches    14289    14196      -93     
==========================================
- Hits        10638    10637       -1     
+ Misses      27681    27356     -325     
- Partials      855     1185     +330     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eddy16112 eddy16112 requested review from apryakhin and muraj October 7, 2025 20:30
#endif

// TODO: remove the logger if used in gpu kernels
#define REALM_ASSERT_WITH_ABORT(expr, logger) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this differently? Something like REALM_BUG_ON or something (following the linux kernel naming for something similar).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @muraj but I don't like REALM_BUG_on either. Perhaps...REALM_ABORT_ON?

log_machine.info() << "ProcInfo: adding proc-mem affinity " << pma.p << " -> "
<< pma.m << " with bandwidth " << pma.bandwidth << " and latency "
<< pma.latency;
assert(pma.bandwidth > 0);
Copy link
Contributor

@muraj muraj Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be removing these. Assert is perfectly fine to use as documentation and debugging, but it should not be used on code that is expected to run in release (i.e. when asserts are turned off).

A much easier way to manage what we need is to do something like the following instead (this is what CUDA does):

#ifdef NDEBUG
#define REALM_ASSERT(x) if(!(x)) { log_runtime.fatal("%s(%d): failed assert '%s'", __FILE__, __LINE__, #x); abort(); }
#else
#define REALM_ASSERT(x) assert(x)
#endif

Then replace all uses of assert() with REALM_ASSERT(). We can then run through at a later time and clean things up as we go through to ensure the usages don't have side effects, therefore we can get rid of the extra logging stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planing to use something like your proposed REALM_ASSERT, but then assert can not be optimized even when NDEBUG is enabled. I am wondering how to decide whether an assert can be safely optimized/ignored in release mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently in a situation where we're always building with NDEBUG off. So it's not a regression to do this. We can always optimize later, let's just fix the original issue in a way we can control first, then optimize later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove any asserts that are unrelated to assert(0). Can we just focus on getting the assert(0) replaced with aborts. All other asserts need to more carefully reviewed and I don't want to waste time on going over everything in a single PR. That will take longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remove any asserts. I just moved it to the upper caller for a better coverage.

Can we just focus on getting the assert(0) replaced with aborts.

The direct reason to clean up the assert is because when the NDEBUG is added (We explicitly removed the NDBUG from the cmake, but it could be added by others), realm could crash, and I have seen it before with the legete conda env on the aarch64. Anyway, I think I will just try Cory's suggestion mentioned above.

bool MachineNodeInfo::add_processor(Processor p)
{
assert(node == NodeID(ID(p).proc_owner_node()));
REALM_ASSERT_WITH_ABORT(node == NodeID(ID(p).proc_owner_node()), log_machine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddy16112 Can you please undo all the changes here. There are lot of assert(0) across the codebase. I suggest let's handle those first please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced them with REALM_ASSERT

@eddy16112 eddy16112 requested review from apryakhin and muraj October 14, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants