Skip to content

replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16 #5181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 13, 2025

Conversation

taoye9
Copy link
Contributor

@taoye9 taoye9 commented Mar 13, 2025

This pr is to replace hack func to cast bf16 to fp32 with standard arm neon intrinsics in arm64 sbgemv_n kernel in previous pr: #5160.

This PR may also slightly improve performance by reducing a cast from two to one assembly instruction—specifically, replacing (UMOV, UBFIZ) with SHL

@martin-frbg martin-frbg added this to the 0.3.30 milestone Mar 13, 2025
@martin-frbg martin-frbg merged commit 2f77855 into OpenMathLib:develop Mar 13, 2025
84 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants