forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
Implement a simple e2e test for PFP. #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
qinkunbao
wants to merge
5
commits into
pcc:pfp
Choose a base branch
from
qinkunbao:pfp
base: pfp
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pcc
reviewed
Apr 4, 2025
pcc
reviewed
Apr 5, 2025
pcc
reviewed
Apr 5, 2025
pcc
reviewed
Apr 5, 2025
pcc
reviewed
Apr 5, 2025
pcc
reviewed
Apr 8, 2025
pcc
pushed a commit
that referenced
this pull request
Apr 9, 2025
…#134130) This should fix failures caused by llvm#133967 Attn: @sarnex Thanks Signed-off-by: Arvind Sudarsanam <[email protected]>
pcc
pushed a commit
that referenced
this pull request
Apr 9, 2025
…d A520 (llvm#132246) Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient.
pcc
pushed a commit
that referenced
this pull request
Apr 9, 2025
… A510/A520 (llvm#134606) Recommit. This work was done by llvm#132246 but failed buildbots due to the test introduced needing updates Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient.
pcc
pushed a commit
that referenced
this pull request
Apr 26, 2025
…ctor-bits=128." (llvm#134997) Reverts llvm#134068 Caused a stage 2 build failure: https://lab.llvm.org/buildbot/#/builders/41/builds/6016 ``` FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp Opcode has unknown scale! UNREACHABLE executed at ../llvm/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4530! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp'. 4. Running pass 'AArch64 load / store optimization pass' on function '@"_ZNSt17_Function_handlerIFN4llvm8ExpectedISt8functionIFNS1_ISt10unique_ptrINS0_16CachedFileStreamESt14default_deleteIS4_EEEEjRKNS0_5TwineEEEEEjNS0_9StringRefESB_EZNS0_10localCacheESB_SB_SB_S2_IFvjSB_S3_INS0_12MemoryBufferES5_ISH_EEEEE3$_0E9_M_invokeERKSt9_Any_dataOjOSF_SB_"' #0 0x0000b6eae9b67bf0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c7bf0) #1 0x0000b6eae9b65aec llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c5aec) #2 0x0000b6eae9acd5f4 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 llvm#3 0x0000f16c1aff28f8 (linux-vdso.so.1+0x8f8) llvm#4 0x0000f16c1aacf1f0 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 llvm#5 0x0000f16c1aa8a67c gsignal ./signal/../sysdeps/posix/raise.c:27:6 llvm#6 0x0000f16c1aa77130 abort ./stdlib/abort.c:81:7 llvm#7 0x0000b6eae9ad6628 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x8136628) llvm#8 0x0000b6eae72e95a8 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x59495a8) llvm#9 0x0000b6eae74ca9a8 (anonymous namespace)::AArch64LoadStoreOpt::findMatchingInsn(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, (anonymous namespace)::LdStPairFlags&, unsigned int, bool) AArch64LoadStoreOptimizer.cpp:0:0 llvm#10 0x0000b6eae74c85a8 (anonymous namespace)::AArch64LoadStoreOpt::tryToPairLdStInst(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) AArch64LoadStoreOptimizer.cpp:0:0 llvm#11 0x0000b6eae74c624c (anonymous namespace)::AArch64LoadStoreOpt::optimizeBlock(llvm::MachineBasicBlock&, bool) AArch64LoadStoreOptimizer.cpp:0:0 llvm#12 0x0000b6eae74c429c (anonymous namespace)::AArch64LoadStoreOpt::runOnMachineFunction(llvm::MachineFunction&) AArch64LoadStoreOptimizer.cpp:0:0 ```
pcc
pushed a commit
that referenced
this pull request
Apr 26, 2025
…vailable (llvm#135343) When a frame is inlined, LLDB will display its name in backtraces as follows: ``` * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3 * frame #0: 0x0000000100000398 a.out`func() [inlined] baz(x=10) at inline.cpp:1:42 frame #1: 0x0000000100000398 a.out`func() [inlined] bar() at inline.cpp:2:37 frame #2: 0x0000000100000398 a.out`func() at inline.cpp:4:15 frame llvm#3: 0x00000001000003c0 a.out`main at inline.cpp:7:5 frame llvm#4: 0x000000026eb29ab8 dyld`start + 6812 ``` The longer the names get the more confusing this gets because the first function name that appears is the parent frame. My assumption (which may need some more surveying) is that for the majority of cases we only care about the actual frame name (not the parent). So this patch removes all the special logic that prints the parent frame. Another quirk of the current format is that the inlined frame name does not abide by the `${function.name-XXX}` format variables. We always just print the raw demangled name. With this patch, we would format the inlined frame name according to the `frame-format` setting (see the test-cases). If we really want to have the `parentFrame [inlined] inlinedFrame` format, we could expose it through a new `frame-format` variable (e..g., `${function.inlined-at-name}` and let the user decide where to place things.
pcc
pushed a commit
that referenced
this pull request
Apr 26, 2025
The linter messed up the order of includes, which is necessary as is.
qinkunbao
pushed a commit
to qinkunbao/llvm-project
that referenced
this pull request
May 13, 2025
The mcmodel=tiny memory model is only valid on ARM targets. While trying this on X86 compiler throws an internal error along with stack dump. llvm#125641 This patch resolves the issue. Reduced test case: ``` #include <stdio.h> int main( void ) { printf( "Hello, World!\n" ); return 0; } ``` ``` 0. Program arguments: /opt/compiler-explorer/clang-trunk/bin/clang++ -gdwarf-4 -g -o /app/output.s -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics -mcmodel=tiny <source> 1. <eof> parser at end of file #0 0x0000000003b10218 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b10218) pcc#1 0x0000000003b0e35c llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b0e35c) pcc#2 0x0000000003a5dbc3 llvm::CrashRecoveryContext::HandleExit(int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dbc3) llvm#3 0x0000000003b05cfe llvm::sys::Process::Exit(int, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b05cfe) llvm#4 0x0000000000d4e3eb LLVMErrorHandler(void*, char const*, bool) cc1_main.cpp:0:0 llvm#5 0x0000000003a67c93 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67c93) llvm#6 0x0000000003a67df8 (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67df8) llvm#7 0x0000000002549148 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x2549148) llvm#8 0x00000000025491fc llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x25491fc) llvm#9 0x0000000003db74cc clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3db74cc) llvm#10 0x0000000004460d95 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4460d95) llvm#11 0x00000000060005ec clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x60005ec) llvm#12 0x00000000044614b5 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44614b5) llvm#13 0x0000000004737121 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4737121) llvm#14 0x00000000046b777b clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x46b777b) llvm#15 0x00000000048229e3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x48229e3) llvm#16 0x0000000000d50621 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd50621) llvm#17 0x0000000000d48e2d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 llvm#18 0x00000000044acc99 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0 llvm#19 0x0000000003a5dac3 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dac3) llvm#20 0x00000000044aceb9 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0 llvm#21 0x00000000044710dd clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44710dd) llvm#22 0x0000000004472071 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4472071) llvm#23 0x000000000447c3fc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x447c3fc) llvm#24 0x0000000000d4d2b1 clang_main(int, char**, llvm::ToolContext const&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd4d2b1) llvm#25 0x0000000000c12464 main (/opt/compiler-explorer/clang-trunk/bin/clang+++0xc12464) llvm#26 0x00007ae43b029d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90) llvm#27 0x00007ae43b029e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40) llvm#28 0x0000000000d488c5 _start (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd488c5) ``` --------- Co-authored-by: Shashwathi N <[email protected]>
qinkunbao
pushed a commit
to qinkunbao/llvm-project
that referenced
this pull request
May 13, 2025
… `getForwardSlice` matchers (llvm#115670) Improve mlir-query tool by implementing `getBackwardSlice` and `getForwardSlice` matchers. As an addition `SetQuery` also needed to be added to enable custom configuration for each query. e.g: `inclusive`, `omitUsesFromAbove`, `omitBlockArguments`. Note: backwardSlice and forwardSlice algoritms are the same as the ones in `mlir/lib/Analysis/SliceAnalysis.cpp` Example of current matcher. The query was made to the file: `mlir/test/mlir-query/complex-test.mlir` ```mlir ./mlir-query /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir -c "match getDefinitions(hasOpName(\"arith.add f\"),2)" Match pcc#1: /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:5:8: %0 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0 : tensor<5x5xf32>) outs(%arg1 : tensor<5x5xf32>) { ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:7:10: note: "root" binds here %2 = arith.addf %in, %in : f32 ^ Match pcc#2: /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:10:16: %collapsed = tensor.collapse_shape %0 [[0, 1]] : tensor<5x5xf32> into tensor<25xf32> ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:13:11: %c2 = arith.constant 2 : index ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:14:18: %extracted = tensor.extract %collapsed[%c2] : tensor<25xf32> ^ /home/dbudii/personal/llvm-project/mlir/test/mlir-query/complex-test.mlir:15:10: note: "root" binds here %2 = arith.addf %extracted, %extracted : f32 ^ 2 matches. ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tested on both X86_64 and arm64.