Description
Today, we support a CHPL_TARGET_PLATFORM
variable that sometimes tells us a lot about the target platform if it's something specific like an HPE Cray EX or Cray XC system, but sometimes tells us little if it's a Linux cluster. In the latter case, the user has to set CHPL_COMM_*
variables to specify how Chapel should map itself to the interconnect, using values like gasnet
or ofi
. In this issue, I'm wondering whether we should introduce a CHPL_INTERCONNECT
or CHPL_NETWORK
variable that would support values like none
, slingshot
, infiniband
, ethernet
, efa
, unset
, etc. as a higher-level way to say something about the target system that's higher-level and likely more known/knowable to a user than the details of how our communication is implemented. From there, we could then (typically) infer reasonable values for the lower-level CHPL_COMM*
related variables (while still permitting a user to set them explicitly, if desired).
For example, I might imagine that setting CHPL_TARGET_PLATFORM=hpe-apollo
would cause CHPL_INTERCONNECT
to be inferred to be infiniband
which would then cause CHPL_COMM
to be inferred to gasnet
and CHPL_COMM_SUBSTRATE
to be inferred to be ibv
(and so on). Yet on a Linux cluster that doesn't have a more specific platform identifier than linux64
, a user could set CHPL_INTERCONNECT=infiniband
and get the same lower-level settings. Or on an Apollo system, the user could override the default and set CHPL_COMM=ofi
if they wanted to try the ofi-based implementation.
To me, this seems like it would prevent most users from ever having to set CHPL_COMM
or its related variables, which feels like a win since that's more about how we implement things than about things a typical user would know, or should need to know.
Activity
bhurwitz33 commentedon Jul 23, 2024
Yes! This definitely resonates with me. I love the idea of introducing a CHPL_INTERCONNECT variable. As Brad says above, this is easy to "know" from a user perspective, because it is easy to look up this info about your HPC system. Plus, if this info can then be used to infer CHPL_COMM* variables that would be great too. To be honest, when I first started, I didn't realize "ibv" stood for Infiniband, and if there was an easier starting point, that would be great!
e-kayrakli commentedon Jul 23, 2024
The proposal can make building oversubscribed Chapel easy as well, but I can't tell how exactly. In the proposed world, what's the way to build Chapel with oversubscription?
CHPL_INTERCONNECT=none && CHPL_COMM=gasnet
? A new value forCHPL_INTERCONNECT
? A completely new variable?bradcray commentedon Jul 24, 2024
@e-kayrakli: Hmm, good question. My first reaction was to require someone wanting oversubscription to use the lower-level variables thinking they'd somehow be "more expert" so should deserve the extra work, but thinking about it more, I think that wanting an oversubscribed Chapel for development purposes is pretty common, suggesting it should be similarly friendly. My thought would be to make it a new value like
virtual
orlocal
which would result in defaults likeCHPL_COMM=gasnet
andCHPL_COMM_SUBSTRATE=udp
orsmp
. I feel least excited about making it a new variable—it feels similar toCHPL_GPU=cpu
to me where we also used a special value rather than a new variable.e-kayrakli commentedon Jul 24, 2024
This strongly resonates with me. I don't view this mode to be a power-user mode. It may be so currently, but this proposal could be an excuse to improve the story there.
[-]Should we support `CHPL_INTERCONNET` / `CHPL_NETWORK`?[/-][+]Should we support `CHPL_INTERCONNECT` / `CHPL_NETWORK`?[/+]mppf commentedon Jul 26, 2024
I like the way that this idea would allow us to hide implementation details (it's using gasnet or ofi). This would also address a point of user feedback where a user requested the abilitiy to simulate multiple locales on a single system without being aware that gasnet exists at all.
bradcray commentedon Jul 26, 2024
I've taken the liberty of adding "user issue" here due to both Michael's connection to the previous issue and Bonnie's response.
jabraham17 commentedon Mar 14, 2025
Noting that in #26921 I have significantly improved our detection for slingshot and infiniband. After that PR is merged, a user running on a generic linux64 infiniband system should get the right settings for
CHPL_COMM
and friends (the implementation details) by default. It doesn't add a proper new variable likeCHPL_NETWORK
, but it does change lots of logic to infer things from the network when possible.I consider this a step in the right direction for this issue, where future work could expand this for efa, ethernet, and potentially others (as well as allowing the user to set it, rather than just relying on defaults)