Skip to content

Should we support CHPL_INTERCONNECT / CHPL_NETWORK? #25616

Open
@bradcray

Description

@bradcray

Today, we support a CHPL_TARGET_PLATFORM variable that sometimes tells us a lot about the target platform if it's something specific like an HPE Cray EX or Cray XC system, but sometimes tells us little if it's a Linux cluster. In the latter case, the user has to set CHPL_COMM_* variables to specify how Chapel should map itself to the interconnect, using values like gasnet or ofi. In this issue, I'm wondering whether we should introduce a CHPL_INTERCONNECT or CHPL_NETWORK variable that would support values like none, slingshot, infiniband, ethernet, efa, unset, etc. as a higher-level way to say something about the target system that's higher-level and likely more known/knowable to a user than the details of how our communication is implemented. From there, we could then (typically) infer reasonable values for the lower-level CHPL_COMM* related variables (while still permitting a user to set them explicitly, if desired).

For example, I might imagine that setting CHPL_TARGET_PLATFORM=hpe-apollo would cause CHPL_INTERCONNECT to be inferred to be infiniband which would then cause CHPL_COMM to be inferred to gasnet and CHPL_COMM_SUBSTRATE to be inferred to be ibv (and so on). Yet on a Linux cluster that doesn't have a more specific platform identifier than linux64, a user could set CHPL_INTERCONNECT=infiniband and get the same lower-level settings. Or on an Apollo system, the user could override the default and set CHPL_COMM=ofi if they wanted to try the ofi-based implementation.

To me, this seems like it would prevent most users from ever having to set CHPL_COMM or its related variables, which feels like a win since that's more about how we implement things than about things a typical user would know, or should need to know.

Activity

bhurwitz33

bhurwitz33 commented on Jul 23, 2024

@bhurwitz33
Contributor

Yes! This definitely resonates with me. I love the idea of introducing a CHPL_INTERCONNECT variable. As Brad says above, this is easy to "know" from a user perspective, because it is easy to look up this info about your HPC system. Plus, if this info can then be used to infer CHPL_COMM* variables that would be great too. To be honest, when I first started, I didn't realize "ibv" stood for Infiniband, and if there was an easier starting point, that would be great!

e-kayrakli

e-kayrakli commented on Jul 23, 2024

@e-kayrakli
Contributor

The proposal can make building oversubscribed Chapel easy as well, but I can't tell how exactly. In the proposed world, what's the way to build Chapel with oversubscription? CHPL_INTERCONNECT=none && CHPL_COMM=gasnet? A new value for CHPL_INTERCONNECT? A completely new variable?

bradcray

bradcray commented on Jul 24, 2024

@bradcray
MemberAuthor

@e-kayrakli: Hmm, good question. My first reaction was to require someone wanting oversubscription to use the lower-level variables thinking they'd somehow be "more expert" so should deserve the extra work, but thinking about it more, I think that wanting an oversubscribed Chapel for development purposes is pretty common, suggesting it should be similarly friendly. My thought would be to make it a new value like virtual or local which would result in defaults like CHPL_COMM=gasnet and CHPL_COMM_SUBSTRATE=udp or smp. I feel least excited about making it a new variable—it feels similar to CHPL_GPU=cpu to me where we also used a special value rather than a new variable.

e-kayrakli

e-kayrakli commented on Jul 24, 2024

@e-kayrakli
Contributor

but thinking about it more, I think that wanting an oversubscribed Chapel for development purposes is pretty common, suggesting it should be similarly friendly.

This strongly resonates with me. I don't view this mode to be a power-user mode. It may be so currently, but this proposal could be an excuse to improve the story there.

changed the title [-]Should we support `CHPL_INTERCONNET` / `CHPL_NETWORK`?[/-] [+]Should we support `CHPL_INTERCONNECT` / `CHPL_NETWORK`?[/+] on Jul 26, 2024
mppf

mppf commented on Jul 26, 2024

@mppf
Member

I like the way that this idea would allow us to hide implementation details (it's using gasnet or ofi). This would also address a point of user feedback where a user requested the abilitiy to simulate multiple locales on a single system without being aware that gasnet exists at all.

bradcray

bradcray commented on Jul 26, 2024

@bradcray
MemberAuthor

I've taken the liberty of adding "user issue" here due to both Michael's connection to the previous issue and Bonnie's response.

jabraham17

jabraham17 commented on Mar 14, 2025

@jabraham17
Member

Noting that in #26921 I have significantly improved our detection for slingshot and infiniband. After that PR is merged, a user running on a generic linux64 infiniband system should get the right settings for CHPL_COMM and friends (the implementation details) by default. It doesn't add a proper new variable like CHPL_NETWORK, but it does change lots of logic to infer things from the network when possible.

I consider this a step in the right direction for this issue, where future work could expand this for efa, ethernet, and potentially others (as well as allowing the user to set it, rather than just relying on defaults)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mppf@e-kayrakli@bhurwitz33@bradcray@jabraham17

        Issue actions

          Should we support `CHPL_INTERCONNECT` / `CHPL_NETWORK`? · Issue #25616 · chapel-lang/chapel