Skip to content

[compiler-rt][AArch64] Provide basic implementations of SME memcpy/memmove in case of strictly aligned memory access #138250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vhscampos
Copy link
Member

The existing implementations, written in assembly, make use of unaligned accesses for performance reasons. They are not compatible with strict aligned configurations, i.e. with -mno-unaligned-access.

If the functions are used in this scenario, an exception is raised due to unaligned memory accesses.

This patch reintroduces vanilla implementations for these functions to be used in strictly aligned configurations. The actual code is largely based on the code from #77496

…mmove in case of strictly aligned memory access

The existing implementations, written in assembly, make use of unaligned
accesses for performance reasons. They are not compatible with strict
aligned configurations, i.e. with `-mno-unaligned-access`.

If the functions are used in this scenario, an exception is raised due
to unaligned memory accesses.

This patch reintroduces vanilla implementations for these functions to
be used in strictly aligned configurations. The actual code is largely
based on the code from llvm#77496

#ifndef __ARM_FEATURE_UNALIGNED

static void *memcpy_fwd(void *dest, const void *src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still worth adding the __arm_sc_ prefix to these functions to keep all functions in the same namespace? Just worried about potential clashes with other target implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These internal functions are now marked as static to avoid clashes. But for clarity I could still add a prefix. Please let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah I see. If it's ok with you, I do think it's better to have the same naming scheme for all the functions and start with __arm_sc_

const unsigned char *srcp = (const unsigned char *)src;

if ((srcp > (destp + n)) || (destp > (srcp + n)))
return __arm_sc_memcpy(dest, src, n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're calling __arm_sc_memcpy because it has the __restrict attributes on the pointers and you're hoping the compiler will take advantage of that during compilation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. From some experimentation, the compiler does take advantage of it: https://godbolt.org/z/oj4b97KK9

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


#ifndef __ARM_FEATURE_UNALIGNED

static void *memcpy_fwd(void *dest, const void *src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah I see. If it's ok with you, I do think it's better to have the same naming scheme for all the functions and start with __arm_sc_

@@ -22,3 +23,46 @@ const void *__arm_sc_memchr(const void *src, int c,

return NULL;
}

#ifndef __ARM_FEATURE_UNALIGNED
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid creating many different variants and various ifdefs, does it make sense to choose the implementation (either sme-libc-mem-routines.S (optimised) or sme-libc-routines.c (basic)) in CMakeLists.txt ? That way we can also let cmake emit a warning that the unoptimised SME routines are chosen, because of the constraints given by the compilation flags to compiler-rt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants