You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Added optional const GroupScaleParam* group_scale argument to
IUal::reorder(), UalDlp::reorder() and UalRef::reorder() for the
sym_quant APIs. The argument defaults to nullptr, keeping existing callers
source-compatible.
- UalDlp::reorder() now selects the sym_quant reorder APIs
aocl_get_reorder_buf_size_s8s8s32os32_sym_quant() and
aocl_reorder_s8s8s32os32_sym_quant() when given s8 input, s32
accumulation, f32/bf16 output and a non-null group_scale, normalizing
group_size == 0 to the full K dimension since the sym_quant APIs
require a strictly positive group size.
- Fixed a leak of the A pack buffer in the GEMV (m=1) path of
s8s8_sym_quant kernel.
- Hardened group pre-op validation in
dlp_gemm_translate_to_group_postops_list() to also reject scale-factor
and zero-point arrays shorter than the required length, i.e.,
m*(ceil(k/group_size)) for A matrix and n*(ceil(k/group_size)) for B matrix.
- Fixed the reference skipping group-scale de-quantization under post-ops in
RefUalPlan::execute() since it was taking the integer GEMM's
needsF32Intermediate path. isS8S8GroupScale is now computed earlier
and excluded from needsF32Intermediate, letting these cases fall through
to the sym_quant reference which de-quantizes and then applies post-ops
via applyPostOps().
AMD-Internal: [CPUPL-8537]
Signed-off-by: Arnav Sharma <Arnav.Sharma@amd.com>
0 commit comments