LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723
LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723
Conversation
…123) Use the generic switch rather than encoding the version number it currently corresponds to.
… for risc-v (#110690)
The code generated for calls with FPCC eligible structs as arguments
doesn't consider the bitfield, which results in a store crossing the
boundary of the memory allocated using alloca, e.g.
For the code:
```
struct __attribute__((packed, aligned(1))) S {
const float f0;
unsigned f1 : 1;
};
unsigned func(struct S arg)
{
return arg.f1;
}
```
The generated IR is:
```
define dso_local signext i32 @func(
float [[TMP0:%.*]], i32 [[TMP1:%.*]]) #[[ATTR0:[0-9]+]] {
[[ENTRY:.*:]]
[[ARG:%.*]] = alloca [[STRUCT_S:%.*]], align 1
[[TMP2:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 0
store float [[TMP0]], ptr [[TMP2]], align 1
[[TMP3:%.*]] = getelementptr inbounds nuw { float, i32 }, ptr [[ARG]], i32 0, i32 1
store i32 [[TMP1]], ptr [[TMP3]], align 1
[[F1:%.*]] = getelementptr inbounds nuw [[STRUCT_S]], ptr [[ARG]], i32 0, i32 1
[[BF_LOAD:%.*]] = load i8, ptr [[F1]], align 1
[[BF_CLEAR:%.*]] = and i8 [[BF_LOAD]], 1
[[BF_CAST:%.*]] = zext i8 [[BF_CLEAR]] to i32
ret i32 [[BF_CAST]]
```
Where, `store i32 [[TMP1]], ptr [[TMP3]], align 1` can be seen crossing
the boundary of the allocated memory. If, the IR is seen after
optimizations (EarlyCSEPass), the IR left is:
```
define dso_local noundef signext i32 @func(
float [[TMP0:%.*]], i32 [[TMP1:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
[[ENTRY:.*:]]
ret i32 0
```
The patch trims the second member of the struct after taking into
consideration the bitwidth to decide the appropriate integer type and
the test shows the results of this patch.
Note that the bug is seen only when `f` extension is enabled for FPCC
eligibility.
Co-authored-by: muhammad.kamran4 <muhammad.kamran@esperantotech.com>
…697) Device libs has a fast sqrt macro implemented this way.
Add tests targeting assembly printing and miscellaneous CodeGen areas with low coverage: - asm-printer-cpool.ll: HexagonAsmPrinter exercising constant pool entry emission. - asm-operand-modifiers.ll: Inline asm operand modifier printing paths (lo/hi/mem). - target-objfile-sdata.ll, split-double-volatile.ll, reg-info-types.ll: Miscellaneous CodeGen coverage for HexagonTargetObjectFile small data classification, HexagonSplitDouble volatile load handling, and HexagonRegisterInfo register class queries. - constext-store-imm.ll: HexagonConstExtenders store-immediate optimization paths.
This removes dyn_cast invocations where the argument is already of the target type (including through subtyping). This was created by adding a static assert in dyn_cast and letting an LLM iterate until the code base compiled. I then went through each example and cleaned it up. This does not commit the static assert in dyn_cast, because it would prevent a lot of uses in templated code. To prevent backsliding we should instead add an LLVM aware version of https://clang.llvm.org/extra/clang-tidy/checks/readability/redundant-casting.html (or expand the existing one).
CONFLICT (content): Merge conflict in llvm/lib/IR/DiagnosticInfo.cpp
The test used to look all good, but actually not. The WeakVH just make itself null after the pointed value being replaced. So a zero value was used because VarIndex become null. The test checks looks all good. Actually only the WeakTrackingVH have the ability to be updated to new value. Change the test slightly to make that using zero index is wrong.
Previously, it generated extra `single` quote marks around the outer
braces (i.e., `'{'` `6442:\220,1\22` `'}'`). SPIR-V backend does not
expect that. It expects `{6442:\220,1\22}`.
… device (#189140) [Driver][HIP] Fix bundled -S emitting bitcode instead of assembly for device PR #188262 added support for bundling HIP -S output under the new offload driver, but the device backend still entered the bitcode-emitting path in ConstructPhaseAction. The condition at the Backend phase checked for the new offload driver and directed device code to emit TY_LLVM_BC, without excluding the -S case. This caused the device section in the bundled .s to contain LLVM bitcode instead of textual AMDGPU assembly. This broke the HIP UT CheckCodeObjAttr test which greps copyKernel.s for "uniform_work_group_size" — a string that only appears in textual assembly, not in bitcode. Fix by excluding -S (without -emit-llvm) from the new-driver bitcode path, so the device backend falls through to emit TY_PP_Asm (textual assembly). Also add a missing lit test check that the device backend produces assembler output for the bundled -S case. Fixes: LCOMPILER-553
…aries (#189044) We only did this for local variables but were were missing it for globals.
…ardOperands API to BranchOpInterface (#187864) To simplify the output of the reduction-tree pass, this PR introduces the eraseRedundantBlocksInRegion. For regions containing multiple execution paths, this functionality selects the shortest 'interesting' path. Additionally, this PR adds the getSuccessorForwardOperands API to BranchOpInterface. This allows us to extract the ForwardOperands for a specific path chosen from multiple alternatives, enabling the creation of a cf.br operation for the redirected jump.
…tions (#189113) Fixes llvm/llvm-project#187716.
…ssorForwardOperands API to BranchOpInterface" (#189150) Reverts llvm/llvm-project#187864, because it is causing same build bot failures. See https://lab.llvm.org/buildbot/#/builders/138/builds/27662 and https://lab.llvm.org/buildbot/#/builders/169/builds/21376/steps/11/logs/stdio for memory leak issues.
…on index (#188508) When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into the static position of a vector.insert op, foldDenseElementsAttrDestInsertOp would proceed to call calculateInsertPosition, which returned -1. The subsequent iterator arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing an assertion in DenseElementsAttr::get. Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any static position equals kPoisonIndex, consistent with how InsertChainFullyInitialized already guards this case. Fixes #188404 Assisted-by: Claude Code
…nt (#189163) When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a module that contains no test dialect operations, the reader type callback in `runTest0` called `reader.getDialectVersion<test::TestDialect>()` and then immediately asserted that it succeeded. However, if the test dialect was never referenced in the bytecode (because no test dialect types appear in the module), the dialect's version information is not stored in the bytecode, so `getDialectVersion` legitimately returns failure. When the test dialect version is unavailable in the bytecode being read, the module contains no test dialect types, so no "funky"-group overrides are needed and the callback can safely skip by returning `success()`. A regression test is added with a module that has no test dialect ops, exercising the `test-dialect-version=2.0` path that previously crashed. Fixes #128321 Fixes #128325 Assisted-by: Claude Code
… (#188064)
This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).
```python
def result(
*,
infer_type: bool = False,
default_factory: Optional[Callable[[], Any]] = None,
kw_only: bool = False,
) -> Any: ...
def operand(
*,
kw_only: bool = False,
) -> Any: ...
def attribute(
*,
default_factory: Optional[Callable[[], Any]] = None,
kw_only: bool = False,
) -> Any: ...
```
Examples about how to use them:
```python
class OperandSpecifierOp(TestFieldSpecifiers.Operation, name="operand_specifier"):
a: Operand[IntegerType[32]] = operand()
b: Optional[Operand[IntegerType[32]]] = None
c: Operand[IntegerType[32]] = operand(kw_only=True)
class ResultSpecifierOp(TestFieldSpecifiers.Operation, name="result_specifier"):
a: Result[IntegerType[32]] = result()
b: Result[IntegerType[16]] = result(infer_type=True)
c: Result[IntegerType] = result(
default_factory=lambda: IntegerType.get_signless(8)
)
d: Sequence[Result[IntegerType]] = result(default_factory=list)
e: Result[IntegerType[32]] = result(kw_only=True)
class AttributeSpecifierOp(
TestFieldSpecifiers.Operation, name="attribute_specifier"
):
a: IntegerAttr = attribute()
b: IntegerAttr = attribute(
default_factory=lambda: IntegerAttr.get(IntegerType.get_signless(32), 42)
)
c: StringAttr["a"] | StringAttr["b"] = attribute(
default_factory=lambda: StringAttr.get("a")
)
d: IntegerAttr = attribute(kw_only=True)
```
---------
Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
Summary: These were renamed and the aliases removed, fix running the tests.
Signed-off-by: Shikhar Soni <shikharish05@gmail.com>
…89128) This fixes #186684. Also fix (not) breaking variables declared on the same line as the closing brace. And adapt whitesmith to that changes.
…efs (#188860) Fixes #188695
…ng and tests (#184365) Closes #181654
… broadcast from sg-to-wi (#185960) This PR adds distribution patterns for vector.step, vector.shape_cast & vector.broadcast in the new sg-to-wi pass
…. (#188721) If a load and a store have different address spaces, we cannot create a runtime check. Instead, always copy the data to an alloca matching the store address space. Fixes llvm/llvm-project#185236. PR: llvm/llvm-project#188721
This fixes b6e4d27. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
…t & mask ops in sg to wi pass (#187392) This PR adds patterns for following vector ops in the new sg-to-wi pass 1. Transpose 2. BitCast 3. CreateMask 4. ConstantMask
…6 (#189468) Fixes: LCOMPILER-1673
…ol-conversion (#189149) Fixes llvm/llvm-project#176889.
…(#189279) This patch introduces an amdgpu wrapper for `rocdl.global.load.async.to.lds.bN` intrinsics, which were introduced in gfx1250. Assisted-by: Claude --------- Signed-off-by: Eric Feng <Eric.Feng@amd.com>
…e.delinearize_index (#188369) Allow `affine.delinearize_index` and `affine.linearize_index` to operate on `vector<...x index>` types in addition to scalar indices. --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This implements handling of cleanup scopes in cases where a flag is needed to indicate whether or not the cleanup is active. This happens in cases where a cleanup is no longer required, but it isn't at the top of the cleanup stack so it can't be popped. A temporary variable is used to set the cleanup to an inactive state when it is no longer needed. Assisted-by: Cursor / claude-4.6-opus-high (implementation) Assisted-by: Cursor / gpt-5.3-codex (tests)
…sts (#3660) Round trip for corresponding CHECK-LLVM is already working for some tests. So they could be enabled Original commit: KhronosGroup/SPIRV-LLVM-Translator@3f5257681447f4c
Update after llvm-project commit 8e1e371 ("[IR][NFC] Mark BranchInst as deprecated (#187314)", 2026-03-19). Original commit: KhronosGroup/SPIRV-LLVM-Translator@6b5f17f12b4be00
After llvm-project commit cf92512 ("[DebugInfo] Add Verifier check for local imports in CU's imports field (#187118)", 2026-03-19), DebugInfo got lost for these tests. Ensure the metadata follows the expected format. Original commit: KhronosGroup/SPIRV-LLVM-Translator@9691713f67ce02c
The tests started to fail with "Unable to meet SPIR-V requirements for this target" after upstream commit llvm/llvm-project@85049fc357ac ("[HLSL][SPIRV] Add support for -g to generate NonSemantic Debug Info (#187051)", 2026-03-25). Original commit: KhronosGroup/SPIRV-LLVM-Translator@40ce6c71d8d5b56
Replace manual save/set/restore of `SPIRVUseTextFormat` with `llvm::SaveAndRestore` to guarantee restoration on all exit paths, including the early return on write error. Fixes Coverity CID 546125. Resolves KhronosGroup/SPIRV-LLVM-Translator#3414 Original commit: KhronosGroup/SPIRV-LLVM-Translator@01ee67ccc9a2c61
Move annotation strings created from UserSemantic decorations to the constant address space. Even though these strings should disappear before instruction selection, we ought to avoid globals in the private addrspace. Also set the source file and auxilliary data arguments to `null` instead poison/undef which seems to be more common in llvm. Original commit: KhronosGroup/SPIRV-LLVM-Translator@8f16307ff9dbe9e
A recent version of SPIRV-Tools found several issues with the test, such as `DebugTypeFunction` having the wrong return type operand and `DebugTypeBasic` missing the flags operand. Original commit: KhronosGroup/SPIRV-LLVM-Translator@bf469923a25d484
) A malformed SPIR-V binary can contain an instruction WordCount below the instruction's minimum, causing wraparound in `resize(WordCount - FixedWC)` and a ~17 GB allocation that can result in `std::bad_alloc` when VA space is limited (32-bit systems, ulimit) or process hang on memory access. Fix by rejecting the malformed input early. AI-assisted: Claude Sonnet 4.6 (commercial SaaS) Original commit: KhronosGroup/SPIRV-LLVM-Translator@5adf335eedd8ba0
As in title, problem exposed during `sanitize_overflow` enablement in triton compiler: intel/intel-xpu-backend-for-triton#6533 Original commit: KhronosGroup/SPIRV-LLVM-Translator@b2410000b1ff3c9
Conflicts: clang/test/lit.site.cfg.py.in libclc/clc/lib/amdgpu/workitem/clc_get_local_id.cl libclc/libspirv/lib/amdgcn-amdhsa/SOURCES
There was a problem hiding this comment.
zizmor found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Replace deprecated BranchInst::Create calls with UncondBrInst::Create and CondBrInst::Create throughout SYCLNativeCPUUtils. This addresses the LLVM deprecation of the unified BranchInst API in favor of separate unconditional and conditional branch instruction classes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix the test naming to use target_name instead of ARG_ARCH_SUFFIX, matching upstream LLVM commit 90e5a1e which fixed name conflicts when multiple libraries use the same target triple. Changes: - Remove unused REMANGLE parameter from cmake_parse_arguments - Remove unnecessary ARG_ARCH_SUFFIX computation - Use ${target_name} for unique test names (upstream approach) - Use ${builtins_file} instead of undefined ${libclc_builtins_lib} - Use ${LIBCLC_SOURCE_DIR} for WORKING_DIRECTORY (upstream approach) This ensures native_cpu builds without CMake test naming conflicts while staying aligned with community conventions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
[libclc] Fix native cpu build @wenju-he Please review and follow up for libclc native cpu support after libclc refactoring. Thanks! |
This reverts commit 7a3eee9. Missing symbols in native_cpu check-libclc are implemented in libdevice, not libclc: MemoryBarrier/ControlBarrier/BuiltInWorkgroupSize/BuiltInLocalInvocationId
native_cpu is not tested in sycl branch. Skip llvmspirv_pulldown branch as well.
|
Reverted in 54aa435. The missing symbols are implemented in libdevice, e.g. llvm/libdevice/nativecpu_utils.cpp Line 46 in c62d1d4 I have skipped native_cpu check-libclc in d93a810. This aligns with sycl branch.
LGTM. just added a minor code formatting in 6e622e5 to align with https://github.com/intel-restricted/applications.compilers.llvm-project/blob/ef83a191161833ae6a631d2a64630a88003e7ac0/libclc/CMakeLists.txt#L597-L601 |
LLVM: llvm/llvm-project@7a3b7f1
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@b241000