-
Notifications
You must be signed in to change notification settings - Fork 9
Refactor vars, make compute classes consistent and fix some variable names #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
Cpp-Linter Report
|
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
2 similar comments
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
23b358a to
4aa4c8f
Compare
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
|
I tested the speedup of
|
|
I tried to see what was causing the slowdown on GPUs. I don't think I got a lot of deep insight, but the following is a summary of what I found:
My next steps are to tidy up the enhancement/refactor_vars_speedup branch commit history, fix the matter classes and the tests, merge in @mirenradia 's CUDA fixes, and then merge that back here. At that point I think we should be ready to review it. The upshot of this is that the storing of vars in the |
|
I can answer most of these now:
@julianakwan is working on this. It will mean that tensors are access with
In the end we don't need this distinction as we store direct to the cell data, so all CCZ4Vars objects are const.
It is not optimal, see above.
Due to the changes in the Tensor objects, this inconsistency should resolve itself.
I have suppressed it. I think that we really do want a reference as a data member and not a pointer. We definitely don't want a copy as that's the whole point of the wrapper.
I will check this again once we are done with the changes.
Yes, or at least remove the branching as it slows things down a lot. But this is for another PR. |
|
I'm trying to benchmark the performance of this branch on Frontier and it annoying fails at start up when AMReX is trying to allocate memory to the Here is the output: Here is Backtrace.0: Not very helpful is it? |
|
@julianakwan, note that @KAClough has set |
Yes I can confirm that this was the problem! There should really be a :face-palm: reaction as I should have checked |
|
I have some benchmarks to report. This was done on Frontier with 1 MPI rank to 1 GCD and each result has been averaged over 3 runs:
This is comparable to the performance that I had measured previously on Frontier with the |
Do you mean the |
I'm sorry for this, it is me who shouldn't have committed it! It is a setting I have to use on the Cosma AMD GPU for reasons I have not understood. I will remove it when I next update the branch. |
Yes, I will correct the post. |
No problem! As I understand it, the Cosma8 MI300Xs are on a shared node in some kind of free-for-all Wild West configuration, so you probably had to manually request enough memory. |
|
I have some bad news to report. I tested this branch @ 72ffeba vs
I think I would accept a small degradation in performance on PVCs since GRTeclyn is more likely to be run on Nvidia and AMD GPUs but I think this is probably too big to accept. I will try and investigate. |
| compute(int i, int j, int k, const amrex::Array4<amrex::Real> &rhs, | ||
| const amrex::Array4<amrex::Real const> &state) const; | ||
| operator()(int ix, int iy, int iz, | ||
| const amrex::Array4<amrex::Real> &rhs_state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this back to rhs? I don't think rhs_state makes any sense since it's either the state (i.e. the variables on the grid) or the RHS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should also be made in CCZ4RHSWithMatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that you've used rhs for the amrex::CellData<...> you get from this amrex::Array4<...>. Maybe it makes sense to make that one rhs_cell_data for consistency with state_cell_data and then you can make the Array4<...> object just rhs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to leave it as it is, as having rhs_cell_data all through the rhs code is more ugly. I don't see what is wrong with rhs_state being the rhs of the state, and I am sure people will figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems quite inconsistent and confusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're actually already using rhs_cell_data in some places!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I know
These functions are only going to be called in device kernels so no point using AMREX_GPU_HOST_DEVICE.
This function is only going to be called in device kernels so no point using AMREX_GPU_HOST_DEVICE.
This function is only going to be called in device kernels so no point using AMREX_GPU_HOST_DEVICE.
397d9ca to
64ca462
Compare
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
64ca462 to
f51c902
Compare
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |
f51c902 to
6ea2616
Compare
|
This PR modifies the following files which are ignored by .lint-ignore: Please consider removing the corresponding patterns from .lint-ignore so that these files can be linted. |


This addresses issue #111.
The main goal is to remove all of the vars objects and replace them with a wrapper that directly accesses the right grid data. This involves removing all the enum mapping things, and also rethinking how the derivatives work.
I am also trying to unify the way in which we use the Parallel4 loop, so that for users the examples are more easily copyable. For this reason I try to have it always call an operator() function of the class and to always pass in the full state and derive the cell data within the class (rather than passing in the cell data directly, which would then mean that you can't calculate derivatives for example). So it also addresses #8 and #128 (much to Miren's annoyance).
NB I have not tried to address flattening out the Tensor array here, but once this is done I think it should be relatively simple to implement.
Some questions/comments I had for discussion:
Do we need to have a ConstCCZ4Vars and a CCZ4Vars class? I couldn't work out how to do it but probably there is a way.
Is the method of storing the vars optimal? I end up creating a Tensor object, and then storing it, whereas I think I should be able to update the value directly. For some reason this didn't work. On the other hand, what I have now is very readable.
We now have () brackets for the vars grid object access and [] brackets for Tensor objects like the derivatives. I am kind of ok with this because they are conceptually different and maybe this makes a useful distinction. On the other hand if they were all one or the other there might be less chance of making mistakes.
There is a warning about
const amrex::CellData<amrex::Real> &cell_data;references as data members that I don't know whether to suppress.The regression test fails but the differences are small, should we worry?
Should we remove the option of BSSN?