-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[compiler-rt][AArch64] Provide basic implementations of SME memcpy/memmove in case of strictly aligned memory access #138250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…mmove in case of strictly aligned memory access The existing implementations, written in assembly, make use of unaligned accesses for performance reasons. They are not compatible with strict aligned configurations, i.e. with `-mno-unaligned-access`. If the functions are used in this scenario, an exception is raised due to unaligned memory accesses. This patch reintroduces vanilla implementations for these functions to be used in strictly aligned configurations. The actual code is largely based on the code from llvm#77496
|
||
#ifndef __ARM_FEATURE_UNALIGNED | ||
|
||
static void *memcpy_fwd(void *dest, const void *src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still worth adding the __arm_sc_
prefix to these functions to keep all functions in the same namespace? Just worried about potential clashes with other target implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These internal functions are now marked as static
to avoid clashes. But for clarity I could still add a prefix. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah I see. If it's ok with you, I do think it's better to have the same naming scheme for all the functions and start with __arm_sc_
const unsigned char *srcp = (const unsigned char *)src; | ||
|
||
if ((srcp > (destp + n)) || (destp > (srcp + n))) | ||
return __arm_sc_memcpy(dest, src, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're calling __arm_sc_memcpy
because it has the __restrict
attributes on the pointers and you're hoping the compiler will take advantage of that during compilation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. From some experimentation, the compiler does take advantage of it: https://godbolt.org/z/oj4b97KK9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
#ifndef __ARM_FEATURE_UNALIGNED | ||
|
||
static void *memcpy_fwd(void *dest, const void *src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah I see. If it's ok with you, I do think it's better to have the same naming scheme for all the functions and start with __arm_sc_
@@ -22,3 +23,46 @@ const void *__arm_sc_memchr(const void *src, int c, | |||
|
|||
return NULL; | |||
} | |||
|
|||
#ifndef __ARM_FEATURE_UNALIGNED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid creating many different variants and various ifdefs, does it make sense to choose the implementation (either sme-libc-mem-routines.S
(optimised) or sme-libc-routines.c
(basic)) in CMakeLists.txt ? That way we can also let cmake emit a warning that the unoptimised SME routines are chosen, because of the constraints given by the compilation flags to compiler-rt.
The existing implementations, written in assembly, make use of unaligned accesses for performance reasons. They are not compatible with strict aligned configurations, i.e. with
-mno-unaligned-access
.If the functions are used in this scenario, an exception is raised due to unaligned memory accesses.
This patch reintroduces vanilla implementations for these functions to be used in strictly aligned configurations. The actual code is largely based on the code from #77496