-
Notifications
You must be signed in to change notification settings - Fork 35
Vectorisation sprint #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Vectorisation sprint #654
Changes from 72 commits
Commits
Show all changes
107 commits
Select commit
Hold shift + click to select a range
2a8d17c
codegen: Implement SIMD vectorisation
tj-sun fbc6e4a
add omp simd vectorization mode
tj-sun 5ae780d
add openmp flag and by pass workaround flag
tj-sun ba693dc
DROP BEFORE MERGE: test with correct loopy branch
wence- 4ec0769
Turn of tree vectorize for certain gcc compilers. We might not need t…
sv2518 f9e60fd
Add simd compiler flags.
sv2518 00e073d
Remove time configuration.
sv2518 1cf7698
Default SIMD width.
sv2518 3e66946
Generate CVec Target with batch size infomation and move typedef into…
sv2518 1238ce8
Move zero declaration to loopy code base to be more robust in naming …
sv2518 1d54777
Added conditionals when to vectorise:
sv2518 b369213
Drop omp vectorisation.
sv2518 1c6346e
Add -march=native everywhere.
sv2518 856b6aa
Silence warnings.
sv2518 5e52ce1
Change vector tag.
sv2518 537c14c
Give more control over vectorisation to PyOP2.
sv2518 9317654
Naming adaption.
sv2518 6723b6a
Realize ilp first.
sv2518 38ebc8a
Jenkins.
sv2518 32b2910
Merge branch 'master' into vectorisation-restructure-checks
sv2518 944c6cf
DBM: run against new loopy branch
sv2518 3a1eb24
Lint
sv2518 681e315
More adapations to new PyOP2
sv2518 48d6142
More adapations to new PyOP2
sv2518 792c8f0
DBM take the correct branch
sv2518 2469870
Adapt to new PyOP2 and vectorisation
sv2518 4bbcde5
Adapt to new PyOP2 and vectorisation
sv2518 a5c0455
Fix return wrapper with kernel not kernel
sv2518 c374031
We do need to inline bc Implementing transforms that apply cleanly ac…
sv2518 e7d31eb
First split then tag because loopy does not support retaggin of iname…
sv2518 56a8dde
tag_array_axes requires us to specify the tags for each dimension of …
sv2518 d1171b3
Fix
sv2518 0641c75
fix
sv2518 644842e
improve comments
sv2518 9e58b22
tag only non-constant arrays with vec axes
kaushikcfd 3f133fd
Only vectorise when local kernel is a loopy thing.
sv2518 dcd0b69
shift iel-loop to have lbound of 0
kaushikcfd 907fe58
Fix import
sv2518 ca2aaaf
Debug: try with newer python version
sv2518 0440f66
Debug: try with newer python version
sv2518 4bcb592
change target before inlining
kaushikcfd d42e7e8
ignore loopy vectorization fallback warnings
kaushikcfd 7e37e02
Revert "Debug: try with newer python version"
sv2518 b541dbd
Make complex check tighter
sv2518 caa567a
extend the set of variables that cannot be vecotrized
kaushikcfd c3a96fa
Attempt to fix Slate by inlining of all subkernels
sv2518 dc996de
Add comment
sv2518 fa343e1
placate flake8
kaushikcfd aa7bc0c
blas callables: do not accept vectorized dtypes
kaushikcfd 8302d52
allow inverse.c::inverse() to take in vector dtypes
kaushikcfd a767fe2
Merge remote-tracking branch 'origin/master' into vectorisation-sprint
kaushikcfd 85de156
do not invoke the vectorization pass if one of the arguments is a Mix…
kaushikcfd 30f8ecb
makes freeing logic accurate
kaushikcfd 0d5023d
rewrite solve to accept strided inputs
kaushikcfd d25545b
blas-helpers: corrects the freeing logic
kaushikcfd 0ade829
Don't vectorise the kernel which generates the coordinates for the ex…
sv2518 a4bab8e
PyOP2 compilation: add a pathway to compile with gcc on Mac.
sv2518 175eb14
do not vectorize the entire kernel if some instruction are surrounded…
kaushikcfd 8256bd2
loop being split starts from '0' => do not peel at the head
kaushikcfd 6585dbb
Merge branch 'vectorisation-sprint' of github.com:OP2/PyOP2 into vect…
sv2518 4c0ca6e
Add comment
sv2518 e744092
Fix complex check?
sv2518 5fc4264
Fix complex check?
sv2518 31f0c39
Fix complex check?
sv2518 7e8a86a
Fix complex check?
sv2518 63f1e52
clarifies vectorization strategy
kaushikcfd 8b19370
Updates to transform startegy
kaushikcfd 7a2cbd6
Time configuration is not used anywhere and add doc
sv2518 69d4921
Move conditional
sv2518 43960e6
sun2020study -> cross-element
sv2518 b4c9926
Make default_simd_width more readable
sv2518 c603f3f
cleanup
sv2518 1cee3d7
Lint
sv2518 a671b6c
corrects the condition to not vectorize temps passed to BLAS calls
kaushikcfd 4aa86e1
Add vectorisation config to cache keys
sv2518 60b4b3e
Tests: add a vectorisation test
sv2518 1b3c29e
Cleanup
sv2518 0a54a34
Cleanup
sv2518 9b23200
Use reconfigure not init for changing the vectorisation strategy in t…
sv2518 acb9c89
Cleanup
sv2518 49e2779
Test: improve the vectorisation test.
sv2518 e5fe4d2
Put vectorisation strategy only in cache key of the global kernel.
sv2518 0eff9d6
lint
sv2518 22ce06e
Fix docs
sv2518 bdefbfa
Fix config error
sv2518 2a459e5
Fix config error
sv2518 56c65da
Don't add py-cpuinfo
ca5c51b
Add nbytes property
connorjward dc5f3bc
Drop unused args
sv2518 ac36708
Time->extra_info
sv2518 89c9dec
Merge branch 'vectorisation-sprint' into connorjward/add-nbytes
sv2518 e2af4c7
Merge pull request #666 from OP2/connorjward/add-nbytes
sv2518 4de6f06
Merge branch 'vectorisation-sprint' into JDBetteridge/vectorisation-s…
sv2518 2840f28
Merge pull request #665 from OP2/JDBetteridge/vectorisation-sprint
sv2518 89feb72
Fix bandwidth calculation
0857145
Add simd compiler flag also to LinuxGNU compiler
662241e
Add vectorisation flag to linux clang compiler too
203223c
account for changed in loopy's vectorization syntax
kaushikcfd fae323f
run CI with py3.8
kaushikcfd 030cae5
Fallback for stopping criterium
sv2518 ece0e62
Fallback for stopping criterium
sv2518 934e147
Reduce inames to untag
sv2518 bd95ba3
Reduce inames to untag
sv2518 fd6650d
Fallback for stopping criterium
sv2518 f69755d
unroll (not vectorize) loops surrounding CInstructions
kaushikcfd e72f316
get rid of noop insns
kaushikcfd 09bf629
Fix merge leftovers for vectorisation in chapter 3
sv2518 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.