Isoard.upstream sync#593
Merged
Merged
Conversation
fabs and fneg are similar nodes in that they can always be expanded to integer ops, but currently they diverge when widened. If the widened vector fabs is marked as expand (and the corresponding scalar type is too), LegalizeVectorTypes thinks that it may be turned into a libcall and so will unroll it to avoid the overhead on the undef elements. However unlike the other ops in that list like fsin, fround, flog etc., an fabs marked as expand will never be legalized into a libcall. Like fneg, it can always be expanded into an integer op. This moves it below unrollExpandedOp to bring it in line with fneg, which fixes an issue on RISC-V with f16 fabs being unexpectedly scalarized when there's no zfhmin.
Reported-by: Yingwei Zheng <dtcxzyw2333@gmail.com> Fixes: 02debce ("update_test_checks: improve IR value name stability (#110940)")
…IMM constant splat." > As we're after a constant splat value we can avoid all the complexities of trying to recreate the correct constant via getTargetConstantFromNode. This caused builds to fail with an assertion: X86ISelLowering.cpp:48569 Assertion `C.getZExtValue() != 0 && C.getZExtValue() != maxUIntN(VT.getScalarSizeInBits()) && "Both cases that could cause potential overflows should have " "already been handled." See llvm/llvm-project#111325 This reverts commit 1bc87c9.
…ew padding layout" (#111123) Relands llvm/llvm-project#108375 which had to be reverted because it was failing on the Windows buildbot. Trying to reland this with `msvc::no_unique_address` on Windows.
…tant splat. (REAPPLIED) As we're after a constant splat value we can avoid all the complexities of trying to recreate the correct constant via getTargetConstantFromNode.
Follow-up to the LLDB std::optional data-formatter test failure caused by llvm/llvm-project#110355. Two formats are supported: 1. `__val_` has type `value_type` 2. `__val_`'s type is wrapped in `std::remove_cv_t`
…r of ifdefs The current layout *does* have `removecv_t`. So change the ifdefs to reflect that.
…#110988) Prior to this patch, the LLVMContext was shared across inputs to llvm-dis. Consequently, NamedStructTypes was shared across inputs, which impacts StructType::setName - if a name was reused across inputs, it would get renamed during construction of the struct type, leading to tricky to diagnose confusion.
…REAPPLIED) Followup to 3d862c7 fix - always fold multiply to zero/negation
…10267) The main purpose of this patch is to centralize the logic for creating MLIR operation entry blocks and for binding them to the corresponding symbols. This minimizes the chances of mixing arguments up for operations having multiple entry block argument-generating clauses and prevents divergence while binding arguments. Some changes implemented to this end are: - Split into two functions the creation of the entry block, and the binding of its arguments and the corresponding Fortran symbol. This enabled a significant simplification of the lowering of composite constructs, where it's no longer necessary to manually ensure the lists of arguments and symbols refer to the same variables in the same order and also match the expected order by the `BlockArgOpenMPOpInterface`. - Removed redundant and error-prone passing of types and locations from `ClauseProcessor` methods. Instead, these are obtained from the values in the appropriate clause operands structure. This also simplifies argument lists of several lowering functions. - Access block arguments of already created MLIR operations through the `BlockArgOpenMPOpInterface` instead of directly indexing the argument list of the operation, which is not scalable as more entry block argument-generating clauses are added to an operation. - Simplified the implementation of `genParallelOp` to no longer need to define different callbacks depending on whether delayed privatization is enabled.
…ng discrepancies (#111289) Fix two discrepancies between the cited snippets and the full code.
…g avx512f feature (#111337)
This test passes as-is on non-X86 hosts only because almost no target
implements `isValidFeatureName` (the default implementation
unconditionally returns true). RISC-V does implement it, and like X86
checks that the feature name is one supported by the architecture. This
means the test creates an additional warning on RISC-V due to
`_attribute__((target("avx512f")))`.
The simple solution here is to just explicitly target x86_64-linux-gnu.
This makes the test independent of the one provided by a toolchain clang is built into, which can cause the output of -print-multi-flags-experimental to change.
…hat can fold to BEXT/BZHI With BEXT/BZHI the i64 imm mask will be replaced with a i16/i8 control mask Fixes #111323
…#109803) Specifically: fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm, fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt & ftrunc
Summary: Make a separate thread to run the server when we launch. This is required by CUDA, which you can force with `export CUDA_LAUNCH_BLOCKING=1`. I figured I might as well be consistent and do it for the AMD implementation as well even though I believe it's not necessary.
This reverts commit 3c83102.
In aea0668, API tests were supposed to use LLVM tools. However, a path to a utility is made up incorrectly there: util name should be prefixed with `llvm-`. Hence, it's fixed here.
Reverts llvm/llvm-project#108939 When `AVX` is available but `-mprefer-vector-width=128` some of the `mov` instructions turn into the x86 `rep;movsb` instruction leading to poor performance on "old" architectures (sandybridge, haswell). The possible solutions are : get rid of the `-mprefer-vector-width` option or use smaller static copy sizes in `inline_memcpy_x86_sse2_ge64_sw_prefetching`. Right now a copy size of 3 cache lines (192B) relying exclusively on xmm registers gets turned into `rep;movsb`.
…T_FN_ATTRS_CONSTEXPR defines. NFC. We only need one define - so consistently use __DEFAULT_FN_ATTRS like we do in other headers.
…11001) This is an initial patch to enable constexpr support on the more basic SSE1 intrinsics - such as initialization, arithmetic, logic and fixed shuffles. The plan is to incrementally extend this for SSE2/AVX etc. - initially for the equivalent basic intrinsics, but we can add support for some of the ia32 builtins as well we the need arises.
…sses (#87003)" This caused assertion failures: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7736: SDValue getMemsetValue(SDValue, EVT, SelectionDAG &, const SDLoc &): Assertion `C->getAPIntValue().getBitWidth() == 8' failed. See comment on the PR for a reproducer. > repstosb and repstosd are the same size, but stosd is only done for 0 > because the process of multiplying the constant so that it is copied > across the bytes of the 32-bit number adds extra instructions that cause > the size to increase. For 0, repstosb and repstosd are the same size, > but stosd is only done for 0 because the process of multiplying the > constant so that it is copied across the bytes of the 32-bit number adds > extra instructions that cause the size to increase. For 0, we do not > need to do that at all. > > For memcpy, the same goes, and as a result the minsize check was moved > ahead because a jmp to memcpy encoded takes more bytes than repmovsb. This reverts commit 6de5305.
Previous llvm/llvm-project#110362 (reverted) caused breakage. Here is the PR with fix. My build cmdline: ``` cmake ../llvm \ -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=install \ -DCMAKE_C_COMPILER=gcc-9 \ -DCMAKE_CXX_COMPILER=g++-9 \ -DCMAKE_CUDA_COMPILER=$(which nvcc) \ -DLLVM_ENABLE_LLD=OFF \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_BUILD_EXAMPLES=ON \ -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \ -DLLVM_CCACHE_BUILD=ON \ -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ -DBUILD_SHARED_LIBS=ON \ -DLLVM_ENABLE_PROJECTS='llvm;mlir' ```
Here I'm splitting up the existing "if" statement into two. Mixing hasDefinition() and insert() in one "if" condition would be extremely confusing as hasDefinition() doesn't change anything while insert() does.
As with other operations such as trunc and fp converts, it should be valid to convert bitcast(undef) to undef.
9159cdf to
81bb3c4
Compare
bf68797 to
81bb3c4
Compare
Conflicts: - lld/ELF/Target.cpp - lld/ELF/Target.h - llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll
81bb3c4 to
4be0dab
Compare
added 7 commits
August 4, 2025 17:24
4be0dab to
f56984e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge conflicts in:
Mainly:
TargetInfo *getAIETargetInfo(Ctx &)→void setAIETargetInfo(Ctx &)