Skip to content

[clang] [OpenMP] New OpenMP 6.0 self_maps clause - CodeGen #134131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion clang/lib/CodeGen/CGOpenMPRuntime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6753,6 +6753,8 @@ class MappableExprsHandler {
Bits |= OpenMPOffloadMappingFlags::OMP_MAP_PRESENT;
if (llvm::is_contained(MapModifiers, OMPC_MAP_MODIFIER_ompx_hold))
Bits |= OpenMPOffloadMappingFlags::OMP_MAP_OMPX_HOLD;
if (llvm::is_contained(MapModifiers, OMPC_MAP_MODIFIER_self))
Bits |= OpenMPOffloadMappingFlags::OMP_MAP_SELF;
if (IsNonContiguous)
Bits |= OpenMPOffloadMappingFlags::OMP_MAP_NON_CONTIG;
return Bits;
Expand Down Expand Up @@ -9820,7 +9822,8 @@ void CGOpenMPRuntime::adjustTargetSpecificDataForLambdas(

void CGOpenMPRuntime::processRequiresDirective(const OMPRequiresDecl *D) {
for (const OMPClause *Clause : D->clauselists()) {
if (Clause->getClauseKind() == OMPC_unified_shared_memory) {
if (Clause->getClauseKind() == OMPC_unified_shared_memory ||
Clause->getClauseKind() == OMPC_self_maps) {
HasRequiresUnifiedSharedMemory = true;
OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
} else if (const auto *AC =
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2245,7 +2245,8 @@ static OffloadArch getOffloadArch(CodeGenModule &CGM) {
/// a restriction for OpenMP requires clause "unified_shared_memory".
void CGOpenMPRuntimeGPU::processRequiresDirective(const OMPRequiresDecl *D) {
for (const OMPClause *Clause : D->clauselists()) {
if (Clause->getClauseKind() == OMPC_unified_shared_memory) {
if (Clause->getClauseKind() == OMPC_unified_shared_memory ||
Clause->getClauseKind() == OMPC_self_maps) {
OffloadArch Arch = getOffloadArch(CGM);
switch (Arch) {
case OffloadArch::SM_20:
Expand Down
27 changes: 27 additions & 0 deletions clang/test/OpenMP/requires_codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,22 @@
// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_72 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_NO_ERR
// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_75 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_NO_ERR

// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -DREGION_HOST_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_21 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_30 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_32 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_35 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_37 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_50 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_52 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_53 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_60 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_61 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_62 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_70 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_72 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_75 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR

#if defined(REGION_HOST) || defined(REGION_DEVICE_NO_ERR)
// expected-no-diagnostics
#pragma omp requires unified_shared_memory
Expand All @@ -23,3 +39,14 @@
#ifdef REGION_DEVICE
#pragma omp requires unified_shared_memory // expected-error-re {{Target architecture sm_{{20|21|30|32|35|37|50|52|53|60|61|62}} does not support unified addressing}}
#endif

#ifdef OMP60
#if defined(REGION_HOST_SELF) || defined(REGION_DEVICE_SELF_NO_ERR)
// expected-no-diagnostics
#pragma omp requires self_maps
#endif

#ifdef REGION_DEVICE_SELF
#pragma omp requires self_maps // expected-error-re {{Target architecture sm_{{20|21|30|32|35|37|50|52|53}} does not support unified addressing}}
#endif
#endif
25 changes: 25 additions & 0 deletions clang/test/OpenMP/target_data_codegen.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -690,4 +690,29 @@ void test_present_modifier(int arg) {
{++arg;}
}
#endif
///==========================================================================///

// RUN: %clang_cc1 -DCK10 -verify -Wno-vla -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck %s --check-prefix CK10 --check-prefix CK10-64
// RUN: %clang_cc1 -DCK10 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla %s -emit-llvm -o - | FileCheck %s --check-prefix CK10 --check-prefix CK10-64
// RUN: %clang_cc1 -DCK10 -verify -Wno-vla -fopenmp -fopenmp-version=60 -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck %s --check-prefix CK10 --check-prefix CK10-32
// RUN: %clang_cc1 -DCK10 -fopenmp -fopenmp-version=60 -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla %s -emit-llvm -o - | FileCheck %s --check-prefix CK10 --check-prefix CK10-32

// RUN: %clang_cc1 -DCK10 -verify -Wno-vla -fopenmp-simd -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY2 %s
// RUN: %clang_cc1 -DCK10 -fopenmp-simd -fopenmp-version=60 -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY2 %s
// RUN: %clang_cc1 -DCK10 -fopenmp-simd -fopenmp-version=60 -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=60 -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY2 %s
// SIMD-ONLY2-NOT: {{__kmpc|__tgt}}

#ifdef CK10
void test_self_modifier(int arg) {
// SELF=0x4000 | FROM=0x2 | TO=0x1 = 0x4003
// CK10: private unnamed_addr constant [1 x i64] [i64 [[#0x4003]]]
#pragma omp target data map(self, tofrom \
: arg)
{++arg;}
}
#endif
#endif
2 changes: 2 additions & 0 deletions llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ enum class OpenMPOffloadMappingFlags : uint64_t {
// dynamic.
// This is an OpenMP extension for the sake of OpenACC support.
OMP_MAP_OMPX_HOLD = 0x2000,
/// Self directs mapping without creating a separate device copy.
OMP_MAP_SELF = 0x4000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Most things will be passed by reference (i.e. a pointer), which can be mapped as OMP_MAP_LITERAL. What do we need this extra flag for?

Copy link
Contributor Author

@Ritanya-B-Bharadwaj Ritanya-B-Bharadwaj Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMP_MAP_SELF ensures that a variable is used directly on the device without creating a new copy. As per my understanding, OMP_MAP_LITERAL maps pointers, but it doesn’t guarantee they point to device-accessible memory. This is especially useful for USM, where we don’t want unnecessary mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoever generates the mappings has to guarantee that they are correct. The OMP_MAP_LITERAL is just a mechanism to pass the pointer[1] without any modifications.

Is the OMP_MAP_SELF a part of some bigger plan? Are you planning to add more reviewers? I'd like to see what others think about this.

[1] It doesn't have to be a pointer, just something that fits in a register.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the questions. Adding a new code requires broad discussion, because it will require changing the runtime libraries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I get that OMP_MAP_LITERAL just passes the pointer(or anything that fits into a register as you mentioned) as is, without changing anything. But the problem is, it doesn’t guarantee that the memory it points to is actually accessible on the device. OMP_MAP_SELF explicitly states that the memory is already available on the device and avoids unnecessary mappings. The idea was to avoid unnecessary copies, especially for USM.
If there’s another way to handle this cleanly, I’m open to discussing it. Please let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need OMP_MAP_SELF because the runtime still needs to track a self-mapped variable in the present table, even if no new device copy is created. This is important when a pointer (like firstprivate) refers to that variable — the runtime must handle pointer translations and aliasing correctly. Here is an example:

#include <stdio.h>
#include <assert.h>

#define N 1000
int main() {
  
  int aaa[N];

#pragma omp target data map(self: aaa) // map aaa with self map
  {
    for (int i = 0 ; i < N ; i++) {
      // Update host copy of aaa on host
      aaa[i] = i + 1;
    }
    int *p = aaa;
    #pragma omp target teams distribute parallel for // p is treated as firstprivate
    for (int i = 0 ; i < N ; i++) {
      // Update host copy of aaa on device, since it should be self mapped
      p[i] = p[i] - 1; // without self map of aaa, this may not work
    }
  }
    
  for (int i = 0 ; i < N ; i++) {
    assert( aaa[i] == i );
  }
  printf("PASS\n");
  return 0;
}

/// Signal that the runtime library should use args as an array of
/// descriptor_dim pointers and use args_size as dims. Used when we have
/// non-contiguous list items in target update directive
Expand Down