[Upstream Update] FlashInfer v0.5.3 → v0.6.4

## Summary

FlashInfer **v0.6.4** has been released with new features and performance improvements. The current pin is **v0.5.3** in `docker/Dockerfile.cuda` line 64.

## Key Changes

### New Features

1. **MXFP8 GEMM Support** - New `mm_mxfp8` kernel for mixed-precision inference
2. **TRTLLM-Gen Skip-Softmax Kernels** - For both prefill and decode with MLA support
3. **SM120 (Blackwell) Improvements**:
   - FP4 GEMM tile configs with streamK scheduler
   - Fixed fp8_blockscale_gemm in AOT jit-cache
4. **Sampling CUDA Graph Fix** - Resolves issues with CUDA graph mode
5. **Hopper CI Support** - Added Hopper architecture to continuous integration

### Performance Improvements

- Cache `cudaGetDeviceProperties` in `gdn_prefill` to avoid per-call overhead
- Parallel testing support in unit test script
- Improved GDN decode with cuteDSL kernel
- FP4 one-shot launch config stability fix in allreduce fusion

### Bug Fixes

- Fix for W4A8 autotune crash in cutlass_fused_moe profiler
- Fix enum/int type mismatch in sampling
- InternVL and Qwen3N test case additions
- Video bucketing and warmup fixes

## Files Affected

Per `docs/upstream-versions.md`:
- **docker/Dockerfile.cuda** line 64 (`FLASHINFER` tag)

## Recommended Actions

1. **Review compatibility** with vLLM commit d7de043d55d1dd629554467e23874097e1c48993
2. **Check for FlashInfer API usage** in vLLM integration:
   ````bash
   grep -r "flashinfer" docker/ --include="*.py" --include="Dockerfile*"
   ````
3. **Update Dockerfile**:
   ````dockerfile
   # Line 64 in docker/Dockerfile.cuda
   ARG FLASHINFER_VERSION=v0.6.4
   ````
4. **Test container build** with new FlashInfer version
5. **Run E2E tests** focusing on attention kernels and MoE operations
6. **Validate Blackwell/Hopper GPU support** if using SM120/SM90 hardware

## Impact

**MEDIUM** - Minor version update with new features but no breaking changes detected. New kernels and optimizations may improve performance.

## Upstream References

- Release: https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.6.4
- Full Changelog: https://github.com/flashinfer-ai/flashinfer/compare/v0.5.3...v0.6.4

---

&gt; Generated by [Upstream Dependency Monitor](https://github.com/llm-d/llm-d-infra/actions/runs/22427079845)




> Generated by [Upstream Dependency Monitor](https://github.com/llm-d/llm-d-infra/actions/runs/22427079845)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Upstream Update] FlashInfer v0.5.3 → v0.6.4 #59

Summary

Key Changes

New Features

Performance Improvements

Bug Fixes

Files Affected

Recommended Actions

Impact

Upstream References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Upstream Update] FlashInfer v0.5.3 → v0.6.4 #59

Description

Summary

Key Changes

New Features

Performance Improvements

Bug Fixes

Files Affected

Recommended Actions

Impact

Upstream References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions