Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Oct 31, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot locked and limited conversation to collaborators Oct 31, 2025
@pull pull bot added the ⤵️ pull label Oct 31, 2025
melver and others added 28 commits November 25, 2025 17:21
Add an end-to-end (non-LTO) test verifying that the optimization
pipeline is set up correctly for Profile Guided Heap Optimization (PGHO)
transforms. Ensure that both PGHO and AllocToken can stack, and the
AllocToken pass does not interfere with PGHO and vice versa.
…67543)

For HIP, the SPIR-V backend can be optionally activated with the -use-spirv-backend flag. This option uses the SPIR-V BE instead of the SPIR-V translator. These changes also ensure that -use-spirv-backend does not require external dependencies, such as spirv-as and spirv-link
This backend support will allow the LoadStoreVectorizer, in certain
cases, to fill in gaps when creating load/store vectors and generate
LLVM masked load/stores
(https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To
accomplish this, changes are separated into two parts. This first part
has the backend lowering and TTI changes, and a follow up PR will have
the LSV generate these intrinsics:
#159388.

In this backend change, Masked Loads get lowered to PTX with `#pragma
"used_bytes_mask" [mask];`
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
And Masked Stores get lowered to PTX using the new sink symbol syntax
(https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st).

# TTI Changes
TTI changes are needed because NVPTX only supports masked loads/stores
with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to
check that the mask is constant and pass that result into the TTI check.
Behavior shouldn't change for non-NVPTX targets, which do not care
whether the mask is variable or constant when determining legality, but
all TTI files that implement these API need to be updated.

# Masked store lowering implementation details
If the masked stores make it to the NVPTX backend without being
scalarized, they are handled by the following:
* `NVPTXISelLowering.cpp` - Sets up a custom operation action and
handles it in lowerMSTORE. Similar handling to normal store vectors,
except we read the mask and place a sentinel register `$noreg` in each
position where the mask reads as false.

For example, 
```
t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>
t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10

->

STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1);

```

* `NVPTXInstInfo.td` - changes the definition of store vectors to allow
for a mix of sink symbols and registers.
* `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_".

# Masked load lowering implementation details
Masked loads are routed to normal PTX loads, with one difference: a
`#pragma "used_bytes_mask"` is emitted before the load instruction
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
To accomplish this, a new operand is added to every NVPTXISD Load type
representing this mask.
* `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal
NVPTXISD loads with a mask operand in two ways. 1) In type legalization
through replaceLoadVector, which is the normal path, and 2) through
LowerMLOAD, to handle the legal vector types
(v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both
share the same convertMLOADToLoadWithUsedBytesMask helper. Both default
this operand to UINT32_MAX, representing all bytes on. For the latter,
we need a new `NVPTXISD::MLoadV1` type to represent that edge case
because we cannot put the used bytes mask operand on a generic
LoadSDNode.
* `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them
to created machine instructions.
* `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask
isn't all ones.
* `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update
manual indexing of load operands to account for new operand.
* `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to
the MI definitions.
* `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get
tagged as invariant.

Some generic changes that are needed:
* `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting
masked loads.
* `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked
load SDNode creation
The test was added by b8ef25a. It
failed on at least the following bots, but the failure did not reproduce
on my test machines or in pre-commit CI:

- https://lab.llvm.org/buildbot/#/builders/190/builds/31638
- https://lab.llvm.org/buildbot/#/builders/190/builds/31638

This fix hopefully addresses at least the warnings there.
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.

-
https://libcxx.llvm.org/CodingGuidelines.html#apply-nodiscard-where-relevant
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
… HIP (#167543)" (#169528)

This reverts commit 1a03673.

Reverted due to a failure in hip-spirv-backend-opt.c for
fuchsia-x86_64-linux.
The 'link' clause is like the rest of the global clauses (copyin,
    create, device_resident), except it only has an entry op(thus no
dtor).

This patch also removes a bunch of now stales TODOs from the tests.
The test was added by b8ef25a. It
failed on at least the following bots, but the failure did not reproduce
on my test machines or in pre-commit CI:

- https://lab.llvm.org/buildbot/#/builders/190/builds/31643
- https://lab.llvm.org/buildbot/#/builders/65/builds/25949
- https://lab.llvm.org/buildbot/#/builders/154/builds/24417

d69e701 did not fix the failure.
Hopefully this will.
This upstreams the code to support reserved placement new calls.
- takes both implicit and explicit BTIs into account
- fix related comment in 
   llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
The consumer of zlib in third-party/BUILD.bazel expects zlib-ng from the
BCR, if you still load this version from your WORKSPACE / MODULE.bazel
you need to use this name instead.
Tasks completed:
- Pattern match`select(fcmp(dot(p2, p3), 0), p1, -p1)` to
`faceforward(p1, p2, p3)`
- Add pattern matching tests to
`prelegalizercombiner-select-to-faceforward.mir` and `faceforward.ll`
- Add CL extension error test
`llvm/test/CodeGen/SPIRV/opencl/faceforward-error.ll`
- Add CL extension test for no pattern matching in
`llvm/test/CodeGen/SPIRV/opencl/faceforward.ll`

Closes #137255.
…#165877)

`canEvaluateTruncated` and `canEvaluateSExtd` previously rejected
multi-use values to avoid duplication. This was overly conservative, if
all users of a multi-use value are part of the transform, we can
evaluate it in a different type without duplication.

This change tracks visited values and defers decisions on multi-use
values until we verify all their users were visited.
`EvaluateInDifferentType` now memoizes multi-use values to avoid
creating duplicates.

Applied to truncation and sext. Zext unchanged due to its dual-return
nature.
The fixme the comment refers to was removed.
…169460)

In #157388, we turned `(fmul (fneg X), Y)` into `(fneg (fmul X, Y))`.
However, we forgot to propagate SDNode flags, specifically fast math
flags, from the original FMUL to the new one. This hinders some of the
subsequent (FMA) DAG combiner patterns that relied on the contraction
flag and as a consequence, missed some of the opportunities to generate
negation FMA instructions like `fnmadd`.

This patch fixes this issue by propagating the flags.

---------

Co-authored-by: Craig Topper <[email protected]>
This patch refactors the OpenACC dialect to attach recipe symbols
directly to data operations (acc.private, acc.firstprivate,
acc.reduction)
rather than to compute constructs (acc.parallel, acc.serial, acc.loop).

Motivation:
The previous design required compute constructs to carry both the recipe
symbol and the variable reference, leading to complexity. Additionally,
recipes were required even when they could be generated automatically
through MappableType interfaces.

Changes:
- Data operations (acc.private, acc.firstprivate, acc.reduction) now
  require a 'recipe' attribute referencing their respective recipe
  operations
- Verifier enforces recipe attribute presence for non-MappableType
  operands; MappableType operands can generate recipes on demand
- Compute constructs (acc.parallel, acc.serial, acc.loop) no longer
  carry recipe symbols in their operands
- Updated flang lowering to attach recipes to data operations instead
  of passing them to compute constructs

Format Migration:
Old format:
```
  acc.parallel private(@recipe -> %var : !fir.ref<i32>) { ... }
```

New format:
```
  %private = acc.private varPtr(%var : !fir.ref<i32>)
              recipe(@recipe) -> !fir.ref<i32>
  acc.parallel private(%private : !fir.ref<i32>) { ... }
```

Test Updates:
- Updated all CIR and Flang OpenACC tests to new format
- Fixed CHECK lines to verify recipe attributes on data operations
…to-llvm`. (#166204)

`convert-vector-to-llvm` pass applies a set of vector transformation
patterns that are not included in the standard `convert-to-llvm` pass
interface. These additional transformations are required to properly
lower MLIR vector operations. Since not all vector ops have direct
`llvm` dialect lowering, many of them must first be progressively
rewritten into simpler or more canonical vector ops, which are then
lowered to `llvm`. Therefore, running `convert-vector-to-llvm` is
necessary to ensure a complete and correct lowering of vector operations
to the `llvm` dialect.
This PR enhances the CFG builder to properly handle function parameters
in lifetime analysis:

1. Added code to include parameters in the initial scope during CFG
construction for both `FunctionDecl` and `BlockDecl` types
2. Added a special case to skip reference parameters, as they don't need
automatic destruction
3. Fixed several test cases that were previously marked as "FIXME" due
to missing parameter lifetime tracking

Previously, Clang's lifetime analysis was not properly tracking the
lifetime of function parameters, causing it to miss important
use-after-return bugs when parameter values were returned by reference
or address. This change ensures that parameters are properly tracked in
the CFG, allowing the analyzer to correctly identify when stack memory
associated with parameters is returned.

Fixes #169014
…#168384)

Convert `(setcc (and X, 1), 0, eq)` to `XORI (and X, 1), 1`  , it will save one instruction.
This hasn't been strictly necessary since c897c13.
Practically this makes little difference; we still enable IPRA
by default which implies this option. By removing this explicit
force, -enable-ipra=0 has the expected change in the pass pipeline
to remove the DummyCGSCC runs.
durga4github and others added 29 commits November 27, 2025 19:18
This patch updates the NVVM Dialect docs to:
* include information on the type of pointers for the memory spaces.
* include high-level information on mbarrier objects.

Signed-off-by: Durgadoss R <[email protected]>
…164326)

This PR adds checks for when emitting weak aliases in: `void
CodeGenModule::EmitGlobal(GlobalDecl GD)`, before for device compilation
for OpenMP, HIP and Cuda, clang would look for the aliasee even if it
was never marked for device compilation.

For OpenMP the following case now works:

> Failed before when compiling with device, ie: `clang -fopenmp
-fopenmp-targets=amdgcn-amd-amdhsa`
>   ```
>   int __Two(void) { return 2; }
>   int Two(void) __attribute__ ((weak, alias("__Two")));
>   ```

For HIP / Cuda:

> 
> ```
> int __HostFunc(void) { return 42; }
> int HostFunc(void) __attribute__ ((weak, alias("__HostFunc")));
> ```

For HIP:

>Failed before on HIP, Cuda fails due to: `NVPTX aliasee must not be
'.weak'` error
> ``` 
> __device__ int __One(void) { return 2; }
> __device__ int One(void) __attribute__ ((weak, alias("__One")));
> ```

Included are Codegen LIT tests for the above cases, and also cases for
weak alias cases that currently work in clang.

Fixes #117369
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.

The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  // Populated by the callback:
  // unsafe: if any thread makes it to the end without being cancelled
  // it won't reach this barrier and then the program will deadlock
  %unused = call i32 @__kmpc_cancel_barrier(...)
  br label %fini

fini:
  // run destructors etc
  ret
```

In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.

```
...
  %ret = call i32 @__kmpc_cancel(...)
  %cond = icmp eq i32 %ret, 0
  br i1 %cond, label %continue, label %cancel

continue:
  // do the rest of the callback, eventually branching to %fini
  br label %fini

cancel:
  br label %fini

fini:
  // run destructors etc
  // safe so long as every exit from the function happens via this block:
  %unused = call i32 @__kmpc_cancel_barrier(...)
  ret
```

To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.

I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.
`dist_schedule` was previously supported in Flang/Clang but was not
implemented in MLIR, instead a user would get a "not yet implemented"
error. This patch adds support for the `dist_schedule` clause to be
lowered to LLVM IR when used in an `omp.distribute` or `omp.wsloop`
section.

There has needed to be some rework required to ensure that MLIR/LLVM
emits the correct Schedule Type for the clause, as it uses a different
schedule type to other OpenMP directives/clauses in the runtime library.

This patch also ensures that when using dist_schedule or a chunked
schedule clause, the correct llvm loop parallel accesses details are
added.
…ature tables (#168750)

Compiler-rt internal feature table is synced with the one in libgcc
(common/config/i386/i386-cpuinfo.h).

LLVM internal feature table is refactored to include a field ABI_VALUE,
so we won't be relying on ordering to keep the values correct. The table
is also synced to the one in compiler-rt.
Remove Subtarget uses from ARMAsmPrinter, making use of TargetMachine
where applicable and getting the Subtarget from the MF where not. Some
of the `if() llvm_unreachable` have been replaced by `asserts`.
…vels (#169805)

SSE2/SSE42/AVX1/AVX2 + x86-64-v4 (AVX512)
…ed on the language version (#169354)

This is dead code, since `test/libcxx-03` is only ever executed with
`-std=c++03`.
…and fix vector::insert to assign (#157444)

This reduces the amount of code we have to maintain a bit.

This also simplifies `vector` by using the internal API instead of
`#if`s to switch based on language dialect.
… PSLL/PSRA/PSRL var intrinsics to be used in constexpr (#169276)

Resolves #169176
…ions. (#166630)

This is building on top of
#149470, also introducing
parallel accumulator PHIs when the reduction is for floating points,
provided we have the reassoc flag. See also
#166353, which aims to
introduce parallel accumulators for reductions with vector instructions.
…9047)

This patch moves the validation logic of delinearization results from DA
to Delinearization. Also call it in `printDelinearization` to test its
behavior. The motivation is as follows:

- Almost the same code exists in `tryDelinearizeFixedSize` and
`tryDelinearizeParametricSize`. Consolidating it in Delinearization
avoids code duplication.
- Currently this validation logic is not well tested. Moving it to
Delinearization allows us to write regression tests easily.

This patch changes the test outputs and debug messages, but otherwise
NFCI.
The simplified parser incorrectly assumes if there is a context, there
is no return type.

Fixed the case where functions have both a context and a return type.
For example,

`int foo::bar::func()`
`Type<int> foo::bar::func()` 

Also fixed the case where there is no space between the context and
return.
`std::vector<int>foo::bar()`
…169659)

AMD uses the translator to recover LLVM-IR from SPIRV.

Currently, the translator doesn't implement the
`SPV_KHR_float_controls2` extension (I'm working on it).
If this extension is used by the SPIRV module, we cannot translate it
back to LLVM-IR.

I'm working on the extension, but in the meantime, lets just disable it
when the target triple's vendor is `amd`.
This PR adds a dependency to the `BUILD` files overlay silently added by
#169670.

Signed-off-by: Ingo Müller <[email protected]>
…n in scf-uplift-while-to-for (#165216)

When a `scf.if` directly precedes an `scf.condition` in the before
region of an `scf.while` and both share the same condition, move the if
into the after region of the loop. This helps simplify the control flow
to enable uplifting `scf.while` to `scf.for`.
…168403)

In the Inliner pass, tailcalls are converted to calls in the inlined
BasicBlock. If the tailcall is indirect, the `BR` is converted to `BLR`.

These instructions require different BTI landing pads at their targets.

As the targets of indirect tailcalls are unknown, inlining such blocks
is unsound for BTI: they should be skipped instead.
#168370)

Add a standalone pass that rewrites tensor-valued `arith.constant` ops
into `tosa.const`, normalize the TOSA backend contract.

Signed-off-by: Vitalii Shutov <[email protected]>
Co-authored-by: Shubham <[email protected]>
This PR adds support to include syscall.h from MacOS sdk by explicitly including the path to the sdk via `xcrun`.
Currently, the following two snippets get treated very differently from
each other (https://godbolt.org/z/rYGj9TGz6):
```LLVM
define <8 x i8> @foo(<8 x i8> %x, <8 x i8> %y) local_unnamed_addr #0 {
entry:
  %0 = shufflevector <8 x i8> %x, <8 x i8> %y, <8 x i32>
       <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
  ret <8 x i8> %0
}

define <8 x i8> @bar(<8 x i8> %x, <8 x i8> %y) local_unnamed_addr #0 {
entry:
  %0 = shufflevector <8 x i8> %x, <8 x i8> %y, <8 x i32>
       <i32 8, i32 0, i32 9, i32 1, i32 10, i32 2, i32 11, i32 3>
  ret <8 x i8> %0
}
```
```
foo:                                    // @foo
        zip1    v0.8b, v0.8b, v1.8b
        ret
.LCPI1_0:
        .byte   8                               // 0x8
        .byte   0                               // 0x0
        .byte   9                               // 0x9
        .byte   1                               // 0x1
        .byte   10                              // 0xa
        .byte   2                               // 0x2
        .byte   11                              // 0xb
        .byte   3                               // 0x3
bar:                                    // @bar
        adrp    x8, .LCPI1_0
        mov     v0.d[1], v1.d[0]
        ldr     d1, [x8, :lo12:.LCPI1_0]
        tbl     v0.8b, { v0.16b }, v1.8b
        ret
```
The reason is that `isZIPMask` does not recognise the pattern when the
operands are flipped.

This PR fixes `isZIPMask` so that both `foo` and `bar` get compiled as
expected:
```
foo:                                    // @foo
	zip1	v0.8b, v0.8b, v1.8b
	ret
bar:                                    // @bar
	zip1	v0.8b, v1.8b, v0.8b
	ret
```

I intend to open a similar follow-up PR for `isTRNMask`, which seems to
have the same problem.

I noticed this while working on
#137447, though the change
does not on itself fix that issue.
…, null, or object-size error" (#169752)

I originally proposed this rewording when trap reasons were introduced
in
#145967 (comment).
This was not adopted because there was a counter-proposal to split the
enum; however, that work appears to have stalled
(#151243). In the meantime,
there has been an additional datapoint that the current wording is
confusing to users. Thus, let's reword it now to prevent further
confusion.
Use pattern matching to check for intrinsics to slightly simplify code.
…169822)

This PR avoids a compiler warning, which turns into an error with
`-Werror`, for a variable introduced in #169276 and only used in an
assertion (which is, thus, unused if compiled without assertions).

Signed-off-by: Ingo Müller <[email protected]>
Co-authored-by: Simon Pilgrim <[email protected]>
…terminated by no-return blocks" (#169852)

Reverts #167548

As commented at
#167548 (comment)
this is causing miscompiles in two-stage RISC-V Clang/LLVM builds that
result in test failures on the builders.
@pull pull bot merged commit fd19a20 into Ericsson:main Nov 27, 2025
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.