-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[GPU] Set minimum memory of count for reduce mean mode of scatter_elements_update, fix typos and remove space #30491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/scatter_elements_update_ref.cl
Show resolved
Hide resolved
__local int count_v[1]; | ||
__local int count_k[COUNT_LIMIT/64]; | ||
__local int count_v[COUNT_LIMIT/64]; | ||
count_length = COUNT_LIMIT/64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about implementing it from jitter-side? I think that will be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix was implemented from jitter-side as you commented.
Thanks for your comment.
53c2aa7
to
dd8ef87
Compare
@@ -201,6 +201,7 @@ KernelsData ScatterElementsUpdateKernelRef::GetKernelsData(const Params& params) | |||
if (i == 1) { | |||
cldnn_jit.AddConstant(MakeJitConstant("IS_SECOND_ITER", "true")); | |||
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LIMIT", params.engineInfo.maxLocalMemSize)); | |||
cldnn_jit.AddConstant(MakeJitConstant("COUNT_MINIMUM", params.engineInfo.maxLocalMemSize/64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
What does it mean that COUNT_LENGTH==0? Then I guess total workitem size is just 0 and no code will be executed. Isn't it?
-
what about just setting
COUNT_LENGTH = dispatchData.gws[0] * dispatchData.gws[1] * dispatchData.gws[2] if dispatchData.gws[0] * dispatchData.gws[1] * dispatchData.gws[2] != 0 else COUNT_MINIMUM
? Then you don't need to introduce additional variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggested code did not introduce additional variable, and it was applied. Thanks for your comments.
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/scatter_elements_update_ref.cl
Outdated
Show resolved
Hide resolved
dd8ef87
to
09c57b5
Compare
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LENGTH", dispatchData.gws[0] * dispatchData.gws[1] * dispatchData.gws[2])); | ||
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LENGTH", dispatchData.gws[0] * dispatchData.gws[1] * dispatchData.gws[2] != 0 ? | ||
dispatchData.gws[0] * dispatchData.gws[1] * dispatchData.gws[2] : | ||
params.engineInfo.maxLocalMemSize/64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dynamic shape, always use slm? why? I think it should use global mem if we cannot ensure the size to fit in the local mem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
08ff7c1
to
fe00c91
Compare
@@ -199,9 +199,15 @@ KernelsData ScatterElementsUpdateKernelRef::GetKernelsData(const Params& params) | |||
auto entry_point = GetEntryPoint(kernelName, newParams.layerID, params, i); | |||
|
|||
if (i == 1) { | |||
auto count_limit = params.engineInfo.maxLocalMemSize; | |||
auto count_length = dispatchData.gws[0] * dispatchData.gws[1] * dispatchData.gws[2]; | |||
auto min_count_length = count_limit / 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- what is this 64 for ? if we need to add a hard coded number, please use a const variable name with more info e.g., const int var_name_for_purpose = 64;
- Also having ref kernel be limited to # of element fit for local mem is not acceptable. e.g., desktop igpus w/ local mem size 64K/64 => 1024 elements is not acceptable.
- So we should be able to handle such cases not fit for local mem size.
- i.e., we should have two versions 1) use slm 2) not use slm. Please check softmax implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR was updated like;
- As count values could be known while execution, allocatable local memory needed to be provided.
- Max allocatable local memory size was found. It was about maxLocalMemSize / 8, not maxLocalMemSize. (It was checked using cliloader.)
- As count had k and v, each allocatable memory was maxLocalMemSize / 8 / 2.
- If required memory was over allocatable memory, it was allocated in global memory. And if not, it was allocated in local memory.
- As __global could not be used in function call, macro functions were used instead.
- [DRAFT] [GPU] Set internal memory of count for reduce mean #30930 uses internal buffer for dynamic case.
Thanks for your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I request was to use SLM when available (i.e., count_length is less than the limit) and use global memory when not availalbe (i.e., count_length is larger than the limit). Currently if dynamic shape, always using global memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COUNT_LENGTH is used for array size for count_k and count_v, so I thought COUNT_LENGTH should be fixed before run and not changed when running. And COUNT_LENGTH == COUNT_LIMIT for dynamic shape, __local is used with its maximum size. What do you think?
fe00c91
to
eab2f42
Compare
…ve redundant spaces
…ments_update and move the fix to jitter-side
…x get_count argument logic
…ding on its requested size
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LIMIT", params.engineInfo.maxLocalMemSize)); | ||
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LENGTH", newParams.inputs[1].LogicalSize())); | ||
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LIMIT", maxAllocatableMemSize)); | ||
cldnn_jit.AddConstant(MakeJitConstant("COUNT_LENGTH", newParams.inputs[1].LogicalSize() != 0 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why if dynamic, always count_length is max slm size?
We should define actual size using the shape something like INPUT_SIZE_XXX.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As count_length is used for array size of count, it should be fixed before run, so it is max slm size.
I think I didn't understand how INPUT_SIZE_XXX... used, and could you explain more?
Details:
Tickets: