Skip to content

[Upstream Update] FlashInfer v0.5.3 → v0.6.4 #59

@github-actions

Description

@github-actions

Summary

FlashInfer v0.6.4 has been released with new features and performance improvements. The current pin is v0.5.3 in docker/Dockerfile.cuda line 64.

Key Changes

New Features

  1. MXFP8 GEMM Support - New mm_mxfp8 kernel for mixed-precision inference
  2. TRTLLM-Gen Skip-Softmax Kernels - For both prefill and decode with MLA support
  3. SM120 (Blackwell) Improvements:
    • FP4 GEMM tile configs with streamK scheduler
    • Fixed fp8_blockscale_gemm in AOT jit-cache
  4. Sampling CUDA Graph Fix - Resolves issues with CUDA graph mode
  5. Hopper CI Support - Added Hopper architecture to continuous integration

Performance Improvements

  • Cache cudaGetDeviceProperties in gdn_prefill to avoid per-call overhead
  • Parallel testing support in unit test script
  • Improved GDN decode with cuteDSL kernel
  • FP4 one-shot launch config stability fix in allreduce fusion

Bug Fixes

  • Fix for W4A8 autotune crash in cutlass_fused_moe profiler
  • Fix enum/int type mismatch in sampling
  • InternVL and Qwen3N test case additions
  • Video bucketing and warmup fixes

Files Affected

Per docs/upstream-versions.md:

  • docker/Dockerfile.cuda line 64 (FLASHINFER tag)

Recommended Actions

  1. Review compatibility with vLLM commit d7de043d55d1dd629554467e23874097e1c48993
  2. Check for FlashInfer API usage in vLLM integration:
    grep -r "flashinfer" docker/ --include="*.py" --include="Dockerfile*"
  3. Update Dockerfile:
    # Line 64 in docker/Dockerfile.cuda
    ARG FLASHINFER_VERSION=v0.6.4
  4. Test container build with new FlashInfer version
  5. Run E2E tests focusing on attention kernels and MoE operations
  6. Validate Blackwell/Hopper GPU support if using SM120/SM90 hardware

Impact

MEDIUM - Minor version update with new features but no breaking changes detected. New kernels and optimizations may improve performance.

Upstream References


> Generated by Upstream Dependency Monitor

Generated by Upstream Dependency Monitor

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions