forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] main from llvm:main #5651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add an end-to-end (non-LTO) test verifying that the optimization pipeline is set up correctly for Profile Guided Heap Optimization (PGHO) transforms. Ensure that both PGHO and AllocToken can stack, and the AllocToken pass does not interfere with PGHO and vice versa.
…67543) For HIP, the SPIR-V backend can be optionally activated with the -use-spirv-backend flag. This option uses the SPIR-V BE instead of the SPIR-V translator. These changes also ensure that -use-spirv-backend does not require external dependencies, such as spirv-as and spirv-link
This backend support will allow the LoadStoreVectorizer, in certain cases, to fill in gaps when creating load/store vectors and generate LLVM masked load/stores (https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To accomplish this, changes are separated into two parts. This first part has the backend lowering and TTI changes, and a follow up PR will have the LSV generate these intrinsics: #159388. In this backend change, Masked Loads get lowered to PTX with `#pragma "used_bytes_mask" [mask];` (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). And Masked Stores get lowered to PTX using the new sink symbol syntax (https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st). # TTI Changes TTI changes are needed because NVPTX only supports masked loads/stores with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to check that the mask is constant and pass that result into the TTI check. Behavior shouldn't change for non-NVPTX targets, which do not care whether the mask is variable or constant when determining legality, but all TTI files that implement these API need to be updated. # Masked store lowering implementation details If the masked stores make it to the NVPTX backend without being scalarized, they are handled by the following: * `NVPTXISelLowering.cpp` - Sets up a custom operation action and handles it in lowerMSTORE. Similar handling to normal store vectors, except we read the mask and place a sentinel register `$noreg` in each position where the mask reads as false. For example, ``` t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1> t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10 -> STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1); ``` * `NVPTXInstInfo.td` - changes the definition of store vectors to allow for a mix of sink symbols and registers. * `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_". # Masked load lowering implementation details Masked loads are routed to normal PTX loads, with one difference: a `#pragma "used_bytes_mask"` is emitted before the load instruction (https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask). To accomplish this, a new operand is added to every NVPTXISD Load type representing this mask. * `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal NVPTXISD loads with a mask operand in two ways. 1) In type legalization through replaceLoadVector, which is the normal path, and 2) through LowerMLOAD, to handle the legal vector types (v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both share the same convertMLOADToLoadWithUsedBytesMask helper. Both default this operand to UINT32_MAX, representing all bytes on. For the latter, we need a new `NVPTXISD::MLoadV1` type to represent that edge case because we cannot put the used bytes mask operand on a generic LoadSDNode. * `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them to created machine instructions. * `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask isn't all ones. * `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update manual indexing of load operands to account for new operand. * `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to the MI definitions. * `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get tagged as invariant. Some generic changes that are needed: * `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting masked loads. * `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked load SDNode creation
The test was added by b8ef25a. It failed on at least the following bots, but the failure did not reproduce on my test machines or in pre-commit CI: - https://lab.llvm.org/buildbot/#/builders/190/builds/31638 - https://lab.llvm.org/buildbot/#/builders/190/builds/31638 This fix hopefully addresses at least the warnings there.
`[[nodiscard]]` should be applied to functions where discarding the return value is most likely a correctness issue. - https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
Previously libcall lowering decisions were made directly in the TargetLowering constructor. Pull these into the subtarget to facilitate turning LibcallLoweringInfo into a separate analysis in the future.
The 'link' clause is like the rest of the global clauses (copyin,
create, device_resident), except it only has an entry op(thus no
dtor).
This patch also removes a bunch of now stales TODOs from the tests.
The test was added by b8ef25a. It failed on at least the following bots, but the failure did not reproduce on my test machines or in pre-commit CI: - https://lab.llvm.org/buildbot/#/builders/190/builds/31643 - https://lab.llvm.org/buildbot/#/builders/65/builds/25949 - https://lab.llvm.org/buildbot/#/builders/154/builds/24417 d69e701 did not fix the failure. Hopefully this will.
This upstreams the code to support reserved placement new calls.
- takes both implicit and explicit BTIs into account - fix related comment in llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
The consumer of zlib in third-party/BUILD.bazel expects zlib-ng from the BCR, if you still load this version from your WORKSPACE / MODULE.bazel you need to use this name instead.
Tasks completed: - Pattern match`select(fcmp(dot(p2, p3), 0), p1, -p1)` to `faceforward(p1, p2, p3)` - Add pattern matching tests to `prelegalizercombiner-select-to-faceforward.mir` and `faceforward.ll` - Add CL extension error test `llvm/test/CodeGen/SPIRV/opencl/faceforward-error.ll` - Add CL extension test for no pattern matching in `llvm/test/CodeGen/SPIRV/opencl/faceforward.ll` Closes #137255.
…#165877) `canEvaluateTruncated` and `canEvaluateSExtd` previously rejected multi-use values to avoid duplication. This was overly conservative, if all users of a multi-use value are part of the transform, we can evaluate it in a different type without duplication. This change tracks visited values and defers decisions on multi-use values until we verify all their users were visited. `EvaluateInDifferentType` now memoizes multi-use values to avoid creating duplicates. Applied to truncation and sext. Zext unchanged due to its dual-return nature.
The fixme the comment refers to was removed.
…169460) In #157388, we turned `(fmul (fneg X), Y)` into `(fneg (fmul X, Y))`. However, we forgot to propagate SDNode flags, specifically fast math flags, from the original FMUL to the new one. This hinders some of the subsequent (FMA) DAG combiner patterns that relied on the contraction flag and as a consequence, missed some of the opportunities to generate negation FMA instructions like `fnmadd`. This patch fixes this issue by propagating the flags. --------- Co-authored-by: Craig Topper <[email protected]>
This patch refactors the OpenACC dialect to attach recipe symbols directly to data operations (acc.private, acc.firstprivate, acc.reduction) rather than to compute constructs (acc.parallel, acc.serial, acc.loop). Motivation: The previous design required compute constructs to carry both the recipe symbol and the variable reference, leading to complexity. Additionally, recipes were required even when they could be generated automatically through MappableType interfaces. Changes: - Data operations (acc.private, acc.firstprivate, acc.reduction) now require a 'recipe' attribute referencing their respective recipe operations - Verifier enforces recipe attribute presence for non-MappableType operands; MappableType operands can generate recipes on demand - Compute constructs (acc.parallel, acc.serial, acc.loop) no longer carry recipe symbols in their operands - Updated flang lowering to attach recipes to data operations instead of passing them to compute constructs Format Migration: Old format: ``` acc.parallel private(@recipe -> %var : !fir.ref<i32>) { ... } ``` New format: ``` %private = acc.private varPtr(%var : !fir.ref<i32>) recipe(@recipe) -> !fir.ref<i32> acc.parallel private(%private : !fir.ref<i32>) { ... } ``` Test Updates: - Updated all CIR and Flang OpenACC tests to new format - Fixed CHECK lines to verify recipe attributes on data operations
…to-llvm`. (#166204) `convert-vector-to-llvm` pass applies a set of vector transformation patterns that are not included in the standard `convert-to-llvm` pass interface. These additional transformations are required to properly lower MLIR vector operations. Since not all vector ops have direct `llvm` dialect lowering, many of them must first be progressively rewritten into simpler or more canonical vector ops, which are then lowered to `llvm`. Therefore, running `convert-vector-to-llvm` is necessary to ensure a complete and correct lowering of vector operations to the `llvm` dialect.
This PR enhances the CFG builder to properly handle function parameters in lifetime analysis: 1. Added code to include parameters in the initial scope during CFG construction for both `FunctionDecl` and `BlockDecl` types 2. Added a special case to skip reference parameters, as they don't need automatic destruction 3. Fixed several test cases that were previously marked as "FIXME" due to missing parameter lifetime tracking Previously, Clang's lifetime analysis was not properly tracking the lifetime of function parameters, causing it to miss important use-after-return bugs when parameter values were returned by reference or address. This change ensures that parameters are properly tracked in the CFG, allowing the analyzer to correctly identify when stack memory associated with parameters is returned. Fixes #169014
…emberExprInitialized` (#169385)
…#168384) Convert `(setcc (and X, 1), 0, eq)` to `XORI (and X, 1), 1` , it will save one instruction.
As discussed in https://discourse.llvm.org/t/clarification-on-how-to-accept-commit-access-requests/88728, clarify reviewer instructions for how to accept commit access requests.
This hasn't been strictly necessary since c897c13. Practically this makes little difference; we still enable IPRA by default which implies this option. By removing this explicit force, -enable-ipra=0 has the expected change in the pass pipeline to remove the DummyCGSCC runs.
This patch updates the NVVM Dialect docs to: * include information on the type of pointers for the memory spaces. * include high-level information on mbarrier objects. Signed-off-by: Durgadoss R <[email protected]>
…164326) This PR adds checks for when emitting weak aliases in: `void CodeGenModule::EmitGlobal(GlobalDecl GD)`, before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation. For OpenMP the following case now works: > Failed before when compiling with device, ie: `clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa` > ``` > int __Two(void) { return 2; } > int Two(void) __attribute__ ((weak, alias("__Two"))); > ``` For HIP / Cuda: > > ``` > int __HostFunc(void) { return 42; } > int HostFunc(void) __attribute__ ((weak, alias("__HostFunc"))); > ``` For HIP: >Failed before on HIP, Cuda fails due to: `NVPTX aliasee must not be '.weak'` error > ``` > __device__ int __One(void) { return 2; } > __device__ int One(void) __attribute__ ((weak, alias("__One"))); > ``` Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang. Fixes #117369
A barrier will pause execution until all threads reach it. If some go to a different barrier then we deadlock. This manifests in that the finalization callback must only be run once. Fix by ensuring we always go through the same finalization block whether the thread in cancelled or not and no matter which cancellation point causes the cancellation. The old callback only affected PARALLEL, so it has been moved into the code generating PARALLEL. For this reason, we don't need similar changes for other cancellable constructs. We need to create the barrier on the shared exit from the outlined function instead of only on the cancelled branch to make sure that threads exiting normally (without cancellation) meet the same barriers as those which were cancelled. For example, previously we might have generated code like ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: // Populated by the callback: // unsafe: if any thread makes it to the end without being cancelled // it won't reach this barrier and then the program will deadlock %unused = call i32 @__kmpc_cancel_barrier(...) br label %fini fini: // run destructors etc ret ``` In the new version the barrier is moved into fini. I generate it *after* the destructors because the standard describes the barrier as occurring after the end of the parallel region. ``` ... %ret = call i32 @__kmpc_cancel(...) %cond = icmp eq i32 %ret, 0 br i1 %cond, label %continue, label %cancel continue: // do the rest of the callback, eventually branching to %fini br label %fini cancel: br label %fini fini: // run destructors etc // safe so long as every exit from the function happens via this block: %unused = call i32 @__kmpc_cancel_barrier(...) ret ``` To achieve this, the barrier is now generated alongside the finalization code instead of in the callback. This is the reason for the changes to the unit test. I'm unsure if I should keep the incorrect barrier generation callback only on the cancellation branch in clang with the OMPIRBuilder backend because that would match clang's ordinary codegen. Right now I have opted to remove it entirely because it is a deadlock waiting to happen.
`dist_schedule` was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for the `dist_schedule` clause to be lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop` section. There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. This patch also ensures that when using dist_schedule or a chunked schedule clause, the correct llvm loop parallel accesses details are added.
…ature tables (#168750) Compiler-rt internal feature table is synced with the one in libgcc (common/config/i386/i386-cpuinfo.h). LLVM internal feature table is refactored to include a field ABI_VALUE, so we won't be relying on ordering to keep the values correct. The table is also synced to the one in compiler-rt.
Remove Subtarget uses from ARMAsmPrinter, making use of TargetMachine where applicable and getting the Subtarget from the MF where not. Some of the `if() llvm_unreachable` have been replaced by `asserts`.
…vels (#169805) SSE2/SSE42/AVX1/AVX2 + x86-64-v4 (AVX512)
…ed on the language version (#169354) This is dead code, since `test/libcxx-03` is only ever executed with `-std=c++03`.
…and fix vector::insert to assign (#157444) This reduces the amount of code we have to maintain a bit. This also simplifies `vector` by using the internal API instead of `#if`s to switch based on language dialect.
…#138928) Committing on behalf of @stellaraccident
…9047) This patch moves the validation logic of delinearization results from DA to Delinearization. Also call it in `printDelinearization` to test its behavior. The motivation is as follows: - Almost the same code exists in `tryDelinearizeFixedSize` and `tryDelinearizeParametricSize`. Consolidating it in Delinearization avoids code duplication. - Currently this validation logic is not well tested. Moving it to Delinearization allows us to write regression tests easily. This patch changes the test outputs and debug messages, but otherwise NFCI.
The simplified parser incorrectly assumes if there is a context, there is no return type. Fixed the case where functions have both a context and a return type. For example, `int foo::bar::func()` `Type<int> foo::bar::func()` Also fixed the case where there is no space between the context and return. `std::vector<int>foo::bar()`
…169659) AMD uses the translator to recover LLVM-IR from SPIRV. Currently, the translator doesn't implement the `SPV_KHR_float_controls2` extension (I'm working on it). If this extension is used by the SPIRV module, we cannot translate it back to LLVM-IR. I'm working on the extension, but in the meantime, lets just disable it when the target triple's vendor is `amd`.
This PR adds a dependency to the `BUILD` files overlay silently added by #169670. Signed-off-by: Ingo Müller <[email protected]>
…n in scf-uplift-while-to-for (#165216) When a `scf.if` directly precedes an `scf.condition` in the before region of an `scf.while` and both share the same condition, move the if into the after region of the loop. This helps simplify the control flow to enable uplifting `scf.while` to `scf.for`.
…168403) In the Inliner pass, tailcalls are converted to calls in the inlined BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`. These instructions require different BTI landing pads at their targets. As the targets of indirect tailcalls are unknown, inlining such blocks is unsound for BTI: they should be skipped instead.
#168370) Add a standalone pass that rewrites tensor-valued `arith.constant` ops into `tosa.const`, normalize the TOSA backend contract. Signed-off-by: Vitalii Shutov <[email protected]> Co-authored-by: Shubham <[email protected]>
This PR adds support to include syscall.h from MacOS sdk by explicitly including the path to the sdk via `xcrun`.
…169829) Reverts #164586 Reverting due to buildbot failure: https://lab.llvm.org/buildbot/#/builders/169/builds/17519
Currently, the following two snippets get treated very differently from each other (https://godbolt.org/z/rYGj9TGz6): ```LLVM define <8 x i8> @foo(<8 x i8> %x, <8 x i8> %y) local_unnamed_addr #0 { entry: %0 = shufflevector <8 x i8> %x, <8 x i8> %y, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11> ret <8 x i8> %0 } define <8 x i8> @bar(<8 x i8> %x, <8 x i8> %y) local_unnamed_addr #0 { entry: %0 = shufflevector <8 x i8> %x, <8 x i8> %y, <8 x i32> <i32 8, i32 0, i32 9, i32 1, i32 10, i32 2, i32 11, i32 3> ret <8 x i8> %0 } ``` ``` foo: // @foo zip1 v0.8b, v0.8b, v1.8b ret .LCPI1_0: .byte 8 // 0x8 .byte 0 // 0x0 .byte 9 // 0x9 .byte 1 // 0x1 .byte 10 // 0xa .byte 2 // 0x2 .byte 11 // 0xb .byte 3 // 0x3 bar: // @bar adrp x8, .LCPI1_0 mov v0.d[1], v1.d[0] ldr d1, [x8, :lo12:.LCPI1_0] tbl v0.8b, { v0.16b }, v1.8b ret ``` The reason is that `isZIPMask` does not recognise the pattern when the operands are flipped. This PR fixes `isZIPMask` so that both `foo` and `bar` get compiled as expected: ``` foo: // @foo zip1 v0.8b, v0.8b, v1.8b ret bar: // @bar zip1 v0.8b, v1.8b, v0.8b ret ``` I intend to open a similar follow-up PR for `isTRNMask`, which seems to have the same problem. I noticed this while working on #137447, though the change does not on itself fix that issue.
…, null, or object-size error" (#169752) I originally proposed this rewording when trap reasons were introduced in #145967 (comment). This was not adopted because there was a counter-proposal to split the enum; however, that work appears to have stalled (#151243). In the meantime, there has been an additional datapoint that the current wording is confusing to users. Thus, let's reword it now to prevent further confusion.
Use pattern matching to check for intrinsics to slightly simplify code.
…169822) This PR avoids a compiler warning, which turns into an error with `-Werror`, for a variable introduced in #169276 and only used in an assertion (which is, thus, unused if compiled without assertions). Signed-off-by: Ingo Müller <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]>
…terminated by no-return blocks" (#169852) Reverts #167548 As commented at #167548 (comment) this is causing miscompiles in two-stage RISC-V Clang/LLVM builds that result in test failures on the builders.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )