Skip to content

[clang] [OpenMP] New OpenMP 6.0 self_maps clause - CodeGen #134131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Ritanya-B-Bharadwaj
Copy link
Contributor

CodeGen support for self maps in map and requirement clause [Sections 7.9.6 and 10.5.1.6 in OpenMP 6.0 spec].

Patch 1 - #129888

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. flang:openmp clang:openmp OpenMP related changes to Clang labels Apr 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 2, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: None (Ritanya-B-Bharadwaj)

Changes

CodeGen support for self maps in map and requirement clause [Sections 7.9.6 and 10.5.1.6 in OpenMP 6.0 spec].

Patch 1 - #129888


Full diff: https://github.com/llvm/llvm-project/pull/134131.diff

5 Files Affected:

  • (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+4-1)
  • (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+2-1)
  • (modified) clang/test/OpenMP/requires_codegen.cpp (+27)
  • (modified) clang/test/OpenMP/target_data_codegen.cpp (+25)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMPConstants.h (+2)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index b706fa3759c0d..cc671796803fd 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6753,6 +6753,8 @@ class MappableExprsHandler {
       Bits |= OpenMPOffloadMappingFlags::OMP_MAP_PRESENT;
     if (llvm::is_contained(MapModifiers, OMPC_MAP_MODIFIER_ompx_hold))
       Bits |= OpenMPOffloadMappingFlags::OMP_MAP_OMPX_HOLD;
+    if (llvm::is_contained(MapModifiers, OMPC_MAP_MODIFIER_self))
+      Bits |= OpenMPOffloadMappingFlags::OMP_MAP_SELF;
     if (IsNonContiguous)
       Bits |= OpenMPOffloadMappingFlags::OMP_MAP_NON_CONTIG;
     return Bits;
@@ -9820,7 +9822,8 @@ void CGOpenMPRuntime::adjustTargetSpecificDataForLambdas(
 
 void CGOpenMPRuntime::processRequiresDirective(const OMPRequiresDecl *D) {
   for (const OMPClause *Clause : D->clauselists()) {
-    if (Clause->getClauseKind() == OMPC_unified_shared_memory) {
+    if (Clause->getClauseKind() == OMPC_unified_shared_memory ||
+        Clause->getClauseKind() == OMPC_self_maps) {
       HasRequiresUnifiedSharedMemory = true;
       OMPBuilder.Config.setHasRequiresUnifiedSharedMemory(true);
     } else if (const auto *AC =
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index feb2448297542..a0ac7b0bc2980 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -2245,7 +2245,8 @@ static OffloadArch getOffloadArch(CodeGenModule &CGM) {
 /// a restriction for OpenMP requires clause "unified_shared_memory".
 void CGOpenMPRuntimeGPU::processRequiresDirective(const OMPRequiresDecl *D) {
   for (const OMPClause *Clause : D->clauselists()) {
-    if (Clause->getClauseKind() == OMPC_unified_shared_memory) {
+    if (Clause->getClauseKind() == OMPC_unified_shared_memory ||
+        Clause->getClauseKind() == OMPC_self_maps) {
       OffloadArch Arch = getOffloadArch(CGM);
       switch (Arch) {
       case OffloadArch::SM_20:
diff --git a/clang/test/OpenMP/requires_codegen.cpp b/clang/test/OpenMP/requires_codegen.cpp
index 5a641d0be4deb..b484f6266c045 100644
--- a/clang/test/OpenMP/requires_codegen.cpp
+++ b/clang/test/OpenMP/requires_codegen.cpp
@@ -15,6 +15,22 @@
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_72 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_NO_ERR
 // RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_75 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_NO_ERR
 
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -DREGION_HOST_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_21 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_30 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_32 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_35 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_37 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_50 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_52 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_53 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_60 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_61 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_62 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_70 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_72 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
+// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=60 -DOMP60 -x c++ -triple nvptx64-nvidia-cuda -fopenmp-targets=nvptx64-nvidia-cuda -target-cpu sm_75 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o %t-out.ll -DREGION_DEVICE_SELF_NO_ERR
+
 #if defined(REGION_HOST) || defined(REGION_DEVICE_NO_ERR)
 // expected-no-diagnostics
 #pragma omp requires unified_shared_memory
@@ -23,3 +39,14 @@
 #ifdef REGION_DEVICE
 #pragma omp requires unified_shared_memory // expected-error-re {{Target architecture sm_{{20|21|30|32|35|37|50|52|53|60|61|62}} does not support unified addressing}}
 #endif
+
+#ifdef OMP60
+#if defined(REGION_HOST_SELF) || defined(REGION_DEVICE_SELF_NO_ERR)
+// expected-no-diagnostics
+#pragma omp requires self_maps
+#endif
+
+#ifdef REGION_DEVICE_SELF
+#pragma omp requires self_maps // expected-error-re {{Target architecture sm_{{20|21|30|32|35|37|50|52|53}} does not support unified addressing}}
+#endif
+#endif
diff --git a/clang/test/OpenMP/target_data_codegen.cpp b/clang/test/OpenMP/target_data_codegen.cpp
index 926aa593f2ba1..adf8f3f40dd2d 100644
--- a/clang/test/OpenMP/target_data_codegen.cpp
+++ b/clang/test/OpenMP/target_data_codegen.cpp
@@ -690,4 +690,29 @@ void test_present_modifier(int arg) {
   {++arg;}
 }
 #endif
+///==========================================================================///
+
+// RUN: %clang_cc1 -DCK10 -verify -Wno-vla  -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck %s --check-prefix CK10 --check-prefix CK10-64
+// RUN: %clang_cc1 -DCK10 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla  %s -emit-llvm -o - | FileCheck %s  --check-prefix CK10 --check-prefix CK10-64
+// RUN: %clang_cc1 -DCK10 -verify -Wno-vla  -fopenmp -fopenmp-version=60 -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -emit-llvm %s -o - | FileCheck %s  --check-prefix CK10 --check-prefix CK10-32
+// RUN: %clang_cc1 -DCK10 -fopenmp -fopenmp-version=60 -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-version=60 -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla  %s -emit-llvm -o - | FileCheck %s  --check-prefix CK10 --check-prefix CK10-32
+
+// RUN: %clang_cc1 -DCK10 -verify -Wno-vla  -fopenmp-simd -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck --check-prefix SIMD-ONLY2 %s
+// RUN: %clang_cc1 -DCK10 -fopenmp-simd -fopenmp-version=60 -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=60 -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla  %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY2 %s
+// RUN: %clang_cc1 -DCK10 -fopenmp-simd -fopenmp-version=60 -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp-simd -fopenmp-version=60 -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify -Wno-vla  %s -emit-llvm -o - | FileCheck --check-prefix SIMD-ONLY2 %s
+// SIMD-ONLY2-NOT: {{__kmpc|__tgt}}
+
+#ifdef CK10
+void test_self_modifier(int arg) {
+// SELF=0x4000 | FROM=0x2 | TO=0x1 = 0x4003
+// CK10: private unnamed_addr constant [1 x i64] [i64 [[#0x4003]]]
+#pragma omp target data map(self, tofrom \
+                            : arg)
+  {++arg;}
+}
+#endif
 #endif
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h b/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
index 338b56226f204..74abacb404de5 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPConstants.h
@@ -236,6 +236,8 @@ enum class OpenMPOffloadMappingFlags : uint64_t {
   // dynamic.
   // This is an OpenMP extension for the sake of OpenACC support.
   OMP_MAP_OMPX_HOLD = 0x2000,
+  /// Self directs mapping without creating a separate device copy.
+  OMP_MAP_SELF = 0x4000,
   /// Signal that the runtime library should use args as an array of
   /// descriptor_dim pointers and use args_size as dims. Used when we have
   /// non-contiguous list items in target update directive

@@ -236,6 +236,8 @@ enum class OpenMPOffloadMappingFlags : uint64_t {
// dynamic.
// This is an OpenMP extension for the sake of OpenACC support.
OMP_MAP_OMPX_HOLD = 0x2000,
/// Self directs mapping without creating a separate device copy.
OMP_MAP_SELF = 0x4000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Most things will be passed by reference (i.e. a pointer), which can be mapped as OMP_MAP_LITERAL. What do we need this extra flag for?

Copy link
Contributor Author

@Ritanya-B-Bharadwaj Ritanya-B-Bharadwaj Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMP_MAP_SELF ensures that a variable is used directly on the device without creating a new copy. As per my understanding, OMP_MAP_LITERAL maps pointers, but it doesn’t guarantee they point to device-accessible memory. This is especially useful for USM, where we don’t want unnecessary mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoever generates the mappings has to guarantee that they are correct. The OMP_MAP_LITERAL is just a mechanism to pass the pointer[1] without any modifications.

Is the OMP_MAP_SELF a part of some bigger plan? Are you planning to add more reviewers? I'd like to see what others think about this.

[1] It doesn't have to be a pointer, just something that fits in a register.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the questions. Adding a new code requires broad discussion, because it will require changing the runtime libraries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I get that OMP_MAP_LITERAL just passes the pointer(or anything that fits into a register as you mentioned) as is, without changing anything. But the problem is, it doesn’t guarantee that the memory it points to is actually accessible on the device. OMP_MAP_SELF explicitly states that the memory is already available on the device and avoids unnecessary mappings. The idea was to avoid unnecessary copies, especially for USM.
If there’s another way to handle this cleanly, I’m open to discussing it. Please let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need OMP_MAP_SELF because the runtime still needs to track a self-mapped variable in the present table, even if no new device copy is created. This is important when a pointer (like firstprivate) refers to that variable — the runtime must handle pointer translations and aliasing correctly. Here is an example:

#include <stdio.h>
#include <assert.h>

#define N 1000
int main() {
  
  int aaa[N];

#pragma omp target data map(self: aaa) // map aaa with self map
  {
    for (int i = 0 ; i < N ; i++) {
      // Update host copy of aaa on host
      aaa[i] = i + 1;
    }
    int *p = aaa;
    #pragma omp target teams distribute parallel for // p is treated as firstprivate
    for (int i = 0 ; i < N ; i++) {
      // Update host copy of aaa on device, since it should be self mapped
      p[i] = p[i] - 1; // without self map of aaa, this may not work
    }
  }
    
  for (int i = 0 ; i < N ; i++) {
    assert( aaa[i] == i );
  }
  printf("PASS\n");
  return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category flang:openmp
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants