Skip to content

Conversation

eddy16112
Copy link
Contributor

Please see #331 for the bug.

@eddy16112 eddy16112 requested review from apryakhin and muraj October 10, 2025 20:57
@github-actions github-actions bot added the bug Something isn't working label Oct 10, 2025
Copy link

codecov bot commented Oct 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 27.15%. Comparing base (ade8156) to head (2e365aa).

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #333    +/-   ##
========================================
  Coverage   27.15%   27.15%            
========================================
  Files         190      190            
  Lines       39174    39174            
  Branches    14289    14221    -68     
========================================
  Hits        10638    10638            
+ Misses      27681    27436   -245     
- Partials      855     1100   +245     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

case BOOTSTRAP_PLUGIN:
status = bootstrap_loader_init(config->plugin_name, NULL, handle);
// for safety, use MPI as the bootstrap mode to not unload the plugin
status = bootstrap_loader_init(config->plugin_name, NULL, handle, BootstrapMode::BOOTSTRAP_MPI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually what I feared. Realm shouldn't be making any assumptions about the plugin. We should have the plugin itself make a reference to itself such that it won't unload when dlclose is called. You should be able to do that with the following snippet:

/// bootstrap_mpi.cc
int realm_ucp_bootstrap_plugin_init()
{
  // ...
  Dl_info info;
  dladdr(&realm_ucp_bootstrap_plugin_init, &info);
  void *self = dlopen(info.dli_fname, RTLD_NODELETE);
  dlclose(self);
}

This way it's on the plugin itself to acknowledge that it needs to keep itself loaded rather than Realm having to reason if it needs to keep it loaded or not based on some arbitrary context (like the name of the plugin).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a very good suggestion. I did not find a way to let the plugin itself to decide which mode to use. It should be fixed now.

@eddy16112 eddy16112 requested a review from muraj October 14, 2025 19:38
@dongb
Copy link

dongb commented Oct 15, 2025

Hi @muraj @apryakhin could you help to review this PR? We need it to get a build for a smooth DGX Spark support desirably before GTC DTC.
Thanks a lot
-bo

fprintf(stderr, "dlopen failed for %s: %s\n", info.dli_fname, dlerror());
return -1;
}
dlclose(handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention here to actually close handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is the dlclose is just to decrease the reference count of the handle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but maybe the intention is to dlclose the handle we just opened above, not the one that was passed in? That's what I see in @muraj's example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants