Skip to content

Support float16 quantization and other quantization performance improvements#41

Merged
1yefuwang1 merged 5 commits intomainfrom
quantization
Feb 10, 2026
Merged

Support float16 quantization and other quantization performance improvements#41
1yefuwang1 merged 5 commits intomainfrom
quantization

Conversation

@1yefuwang1
Copy link
Owner

No description provided.

1yefuwang1 and others added 4 commits February 8, 2026 20:13
…unroll

Add benchmark suites for QuantizeF32ToF16, QuantizeF32ToBF16, F16ToF32,
and BF16ToF32. Unroll the HalfFloatToF32 main loop to process 2*NF
elements per iteration, improving instruction-level parallelism and
yielding a 1.6-2x speedup on F16/BF16 to F32 dequantization.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace hn::Transform with a manually 4x-unrolled multiply loop in F32
NormalizeImpl for ~1.3x speedup at dim >= 512. Fix BF16 InnerProduct
benchmark that was incorrectly measuring F32 overload. Add
ClobberMemory to normalize benchmarks.

Co-Authored-By: Claude <noreply@anthropic.com>
* Initial plan

* Add float16 quantization support: SIMD ops, distance spaces, vector types, quantization, virtual table support, tests, and benchmarks

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Fix test bug: use correct loop variable j instead of i in Normalize_F32ToF16 test

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Final: float16 quantization support complete

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Remove CodeQL artifact from tracking and add to gitignore

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Remove CodeQL artifact symlink from repository

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Guard F16-specific SIMD ops with #if !HWY_HAVE_FLOAT16 to use native ops when available

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Guard F16-specific SIMD ops with #if !HWY_HAVE_FLOAT16 to use native ops when available

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

* Remove CodeQL artifact from tracking

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>
@1yefuwang1
Copy link
Owner Author

@copilot inspect the CI failure and try to fix it

Copy link
Contributor

Copilot AI commented Feb 10, 2026

@1yefuwang1 I've opened a new pull request, #42, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan

* Add Float16 vector type handling in query executor

Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: 1yefuwang1 <18023393+1yefuwang1@users.noreply.github.com>
@1yefuwang1 1yefuwang1 merged commit 16a01af into main Feb 10, 2026
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants