Skip to content

[Feature]: AutoDeploy: padding for fp8 linear kernel when size % 16 != 0 #8811

@lucaslie

Description

@lucaslie

🚀 The feature, motivation and pitch

Our current fp8 kernel cannot work with GEMM sizes that are not mod 16

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy BackendCustomized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.

Type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions