Skip to content

Isoard.upstream sync#593

Merged
konstantinschwarz merged 757 commits into
aie-publicfrom
isoard.upstream-sync
Aug 4, 2025
Merged

Isoard.upstream sync#593
konstantinschwarz merged 757 commits into
aie-publicfrom
isoard.upstream-sync

Conversation

@isoard-amd

@isoard-amd isoard-amd commented Aug 4, 2025

Copy link
Copy Markdown
Collaborator

Merge conflicts in:

  • lld/ELF/Target.h
  • lld/ELF/Target.cpp

Mainly:
TargetInfo *getAIETargetInfo(Ctx &)void setAIETargetInfo(Ctx &)

zmodem and others added 30 commits October 7, 2024 11:19
fabs and fneg are similar nodes in that they can always be expanded to
integer ops, but currently they diverge when widened.

If the widened vector fabs is marked as expand (and the corresponding
scalar type is too), LegalizeVectorTypes thinks that it may be turned
into a libcall and so will unroll it to avoid the overhead on the undef
elements.

However unlike the other ops in that list like fsin, fround, flog etc.,
an fabs marked as expand will never be legalized into a libcall. Like
fneg, it can always be expanded into an integer op.

This moves it below unrollExpandedOp to bring it in line with fneg,
which fixes an issue on RISC-V with f16 fabs being unexpectedly
scalarized when there's no zfhmin.
Reported-by: Yingwei Zheng <dtcxzyw2333@gmail.com>
Fixes: 02debce ("update_test_checks: improve IR value name
stability (#110940)")
…IMM constant splat."

> As we're after a constant splat value we can avoid all the complexities of trying to recreate the correct constant via getTargetConstantFromNode.

This caused builds to fail with an assertion:
X86ISelLowering.cpp:48569
Assertion `C.getZExtValue() != 0 && C.getZExtValue() != maxUIntN(VT.getScalarSizeInBits())
&& "Both cases that could cause potential overflows should have " "already been handled."

See llvm/llvm-project#111325

This reverts commit 1bc87c9.
…ew padding layout" (#111123)

Relands llvm/llvm-project#108375 which had to be
reverted because it was failing on the Windows buildbot. Trying to
reland this with `msvc::no_unique_address` on Windows.
…tant splat. (REAPPLIED)

As we're after a constant splat value we can avoid all the complexities of trying to recreate the correct constant via getTargetConstantFromNode.
Follow-up to the LLDB std::optional data-formatter test failure caused
by llvm/llvm-project#110355.

Two formats are supported:
1. `__val_` has type `value_type`
2. `__val_`'s type is wrapped in `std::remove_cv_t`
…r of ifdefs

The current layout *does* have `removecv_t`. So change
the ifdefs to reflect that.
…#110988)

Prior to this patch, the LLVMContext was shared across inputs to
llvm-dis.

Consequently, NamedStructTypes was shared across inputs, which impacts
StructType::setName - if a name was reused across inputs, it would get
renamed during construction of the struct type, leading to tricky to
diagnose confusion.
…REAPPLIED)

Followup to 3d862c7 fix - always fold multiply to zero/negation
…10267)

The main purpose of this patch is to centralize the logic for creating
MLIR operation entry blocks and for binding them to the corresponding
symbols. This minimizes the chances of mixing arguments up for
operations having multiple entry block argument-generating clauses and
prevents divergence while binding arguments.

Some changes implemented to this end are:
- Split into two functions the creation of the entry block, and the
binding of its arguments and the corresponding Fortran symbol. This
enabled a significant simplification of the lowering of composite
constructs, where it's no longer necessary to manually ensure the lists
of arguments and symbols refer to the same variables in the same order
and also match the expected order by the `BlockArgOpenMPOpInterface`.
- Removed redundant and error-prone passing of types and locations from
`ClauseProcessor` methods. Instead, these are obtained from the values
in the appropriate clause operands structure. This also simplifies
argument lists of several lowering functions.
- Access block arguments of already created MLIR operations through the
`BlockArgOpenMPOpInterface` instead of directly indexing the argument
list of the operation, which is not scalable as more entry block
argument-generating clauses are added to an operation.
- Simplified the implementation of `genParallelOp` to no longer need to
define different callbacks depending on whether delayed privatization is
enabled.
…ng discrepancies (#111289)

Fix two discrepancies between the cited snippets and the full code.
…g avx512f feature (#111337)

This test passes as-is on non-X86 hosts only because almost no target
implements `isValidFeatureName` (the default implementation
unconditionally returns true). RISC-V does implement it, and like X86
checks that the feature name is one supported by the architecture. This
means the test creates an additional warning on RISC-V due to
`_attribute__((target("avx512f")))`.

The simple solution here is to just explicitly target x86_64-linux-gnu.
This makes the test independent of the one provided by a toolchain clang
is built into, which can cause the output of
-print-multi-flags-experimental to change.
…hat can fold to BEXT/BZHI

With BEXT/BZHI the i64 imm mask will be replaced with a i16/i8 control mask

Fixes #111323
…#109803)

Specifically:
  fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm,
  fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt &
  ftrunc
Summary:
Make a separate thread to run the server when we launch. This is
required by CUDA, which you can force with `export
CUDA_LAUNCH_BLOCKING=1`. I figured I might as well be consistent and do
it for the AMD implementation as well even though I believe it's not
necessary.
In aea0668, API tests were supposed to use LLVM tools.
However, a path to a utility is made up incorrectly there: util name
should be prefixed with `llvm-`.

Hence, it's fixed here.
Reverts llvm/llvm-project#108939

When `AVX` is available but `-mprefer-vector-width=128` some of the
`mov` instructions turn into the x86 `rep;movsb` instruction leading to
poor performance on "old" architectures (sandybridge, haswell). The
possible solutions are : get rid of the `-mprefer-vector-width` option
or use smaller static copy sizes in
`inline_memcpy_x86_sse2_ge64_sw_prefetching`. Right now a copy size of 3
cache lines (192B) relying exclusively on xmm registers gets turned into
`rep;movsb`.
…T_FN_ATTRS_CONSTEXPR defines. NFC.

We only need one define - so consistently use __DEFAULT_FN_ATTRS like we do in other headers.
…11001)

This is an initial patch to enable constexpr support on the more basic SSE1 intrinsics - such as initialization, arithmetic, logic and fixed shuffles.

The plan is to incrementally extend this for SSE2/AVX etc. - initially for the equivalent basic intrinsics, but we can add support for some of the ia32 builtins as well we the need arises.
…sses (#87003)"

This caused assertion failures:

  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7736:
  SDValue getMemsetValue(SDValue, EVT, SelectionDAG &, const SDLoc &):
  Assertion `C->getAPIntValue().getBitWidth() == 8' failed.

See comment on the PR for a reproducer.

> repstosb and repstosd are the same size, but stosd is only done for 0
> because the process of multiplying the constant so that it is copied
> across the bytes of the 32-bit number adds extra instructions that cause
> the size to increase. For 0, repstosb and repstosd are the same size,
> but stosd is only done for 0 because the process of multiplying the
> constant so that it is copied across the bytes of the 32-bit number adds
> extra instructions that cause the size to increase. For 0, we do not
> need to do that at all.
>
> For memcpy, the same goes, and as a result the minsize check was moved
> ahead because a jmp to memcpy encoded takes more bytes than repmovsb.

This reverts commit 6de5305.
Previous llvm/llvm-project#110362 (reverted)
caused breakage. Here is the PR with fix.

My build cmdline:

```
cmake ../llvm \
    -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=install \
    -DCMAKE_C_COMPILER=gcc-9 \
    -DCMAKE_CXX_COMPILER=g++-9 \
    -DCMAKE_CUDA_COMPILER=$(which nvcc) \
    -DLLVM_ENABLE_LLD=OFF \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_BUILD_EXAMPLES=ON \
    -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
    -DLLVM_CCACHE_BUILD=ON \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DBUILD_SHARED_LIBS=ON \
    -DLLVM_ENABLE_PROJECTS='llvm;mlir'
```
Here I'm splitting up the existing "if" statement into two.  Mixing
hasDefinition() and insert() in one "if" condition would be extremely
confusing as hasDefinition() doesn't change anything while insert()
does.
As with other operations such as trunc and fp converts, it should be
valid to convert bitcast(undef) to undef.
Comment thread clang/test/CodeGen/aie/aie2/aie2-stream-intrinsics.cpp Outdated
Comment thread llvm/test/CodeGen/AIE/GlobalISel/simplify-concat-unmerge-phi.mir Outdated
@isoard-amd isoard-amd force-pushed the isoard.upstream-sync branch 3 times, most recently from 9159cdf to 81bb3c4 Compare August 4, 2025 21:16
@isoard-amd isoard-amd requested a review from philippjh as a code owner August 4, 2025 21:58
@isoard-amd isoard-amd force-pushed the isoard.upstream-sync branch from bf68797 to 81bb3c4 Compare August 4, 2025 22:01
Conflicts:
- lld/ELF/Target.cpp
- lld/ELF/Target.h
- llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll
@isoard-amd isoard-amd force-pushed the isoard.upstream-sync branch from 81bb3c4 to 4be0dab Compare August 4, 2025 22:50
@isoard-amd isoard-amd force-pushed the isoard.upstream-sync branch from 4be0dab to f56984e Compare August 4, 2025 23:27
@konstantinschwarz konstantinschwarz merged commit 229366d into aie-public Aug 4, 2025
6 of 7 checks passed
@konstantinschwarz konstantinschwarz deleted the isoard.upstream-sync branch August 4, 2025 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.