- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1k
 
Dev fused nms #10643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Dev fused nms #10643
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for an NPU-compatible SortedNMS operator by extending both the Python-level API and underlying C++ implementations to accept sorted scores and input indices.
- Python wrapper (
nms_op) now branches for NPU to gather and passsorted_scoresandscore_inds. - Core op registration and functor logic updated to include optional 
scoresandinput_indices, with new fused operator for NPU. - Data type inference and API signature modified for NPU support in 
nms_op.cppandfunctional_api.yaml. 
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description | 
|---|---|
| python/oneflow/nn/modules/nms.py | Branch NPU path to gather sorted scores and pass indices | 
| oneflow/user/ops/nms_op.cpp | InferNmsDataType now sets different dtype for NPU | 
| oneflow/ir/include/OneFlow/OneFlowUserOps.td | Added optional scores and input_indices inputs | 
| oneflow/core/functional/impl/nn_functor.cpp | Built fused_op_ and dispatch based on device and inputs | 
| oneflow/core/functional/impl/array_functor.cpp | Route argwhere through CPU for NPU | 
| oneflow/core/functional/functional_api.yaml | Updated nms signature with new optional tensor args | 
Comments suppressed due to low confidence (1)
oneflow/user/ops/nms_op.cpp:34
- The CPU path sets output dtype to Int8, but NMS usually returns index tensors; Int32 or Int64 may be more appropriate.
 
ctx->SetOutputDType("out", 0, DataType::kInt8);
| return OpInterpUtil::Dispatch<Tensor>(*op_, {x}, attrs); | ||
| DeviceType device_type = JUST(x->device())->enum_type(); | ||
| if (device_type == DeviceType::kNPU) { | ||
| if (scores) { | 
    
      
    
      Copilot
AI
    
    
    
      Jun 1, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fused NMS path checks only scores; it should verify both scores and input_indices are provided to avoid passing a null optional downstream.
| if (scores) { | |
| if (scores && input_indices) { | 
| 
           Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.  | 
    
| 
               | 
          ||
| - name: "nms" | ||
| signature: "Tensor (Tensor x, Float iou_threshold, Int32 keep_n=-1) => Nms" | ||
| signature: "Tensor (Tensor x, Tensor scores=None, Tensor input_indices=None, Float iou_threshold, Int32 keep_n=-1) => Nms" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看看是否需要/方便,为npu的nms导出独立的api/functor?
This PR is mainly to support the NPU-compatible SortedNMS operator, so sorted_score and input_indices have been added.