fix (nonmaxsuppression): fix output when use_dyn_output is false (make it onnx compliant) - #4679#4773
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes ONNX NonMaxSuppression output correctness in MIGraphX when use_dyn_output is disabled (previously returned a fixed-size buffer padded with valid [0,0,0] triplets), exposes use_dyn_output in Python parsing, and updates/adds reference tests to match ONNX spec behavior.
Changes:
- Make
nonmaxsuppressionalways return only the selected indices (trim tonum_selected) and adjust shape inference accordingly. - Expose
use_dyn_outputin the Pythonparse_onnx()binding. - Update existing NMS reference tests to remove padding and add targeted tests for static/dynamic input +
use_dyn_outputcombinations.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/include/migraphx/op/nonmaxsuppression.hpp |
Updates shape inference and runtime output shaping to avoid padded outputs and improve spec compliance. |
src/py/migraphx_py.cpp |
Adds use_dyn_output argument to Python parse_onnx() and wires it to onnx_options. |
test/ref/nonmaxsuppression.cpp |
Adjusts gold outputs to match trimmed ONNX behavior and adds new coverage for reported scenarios. |
Comments suppressed due to low confidence (1)
src/py/migraphx_py.cpp:674
- use_dyn_output is now exposed on parse_onnx(), but parse_onnx_buffer() still can’t set onnx_options.use_dyn_output, so Python users parsing from an in-memory buffer remain unable to control this behavior. Consider adding the same use_dyn_output argument (defaulting to false) to parse_onnx_buffer() and wiring it into the options for API parity.
py::arg("max_loop_iterations") = 10,
py::arg("limit_max_iterations") = std::numeric_limits<uint16_t>::max(),
py::arg("use_dyn_output") = false); //https://github.com/ROCm/AMDMIGraphX/issues/4679
m.def(
"parse_onnx_buffer",
[](const std::string& onnx_buffer,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // actual selected boxes. Previously returned fixed max_num_boxes which caused zero padding bug. https://github.com/ROCm/AMDMIGraphX/issues/4679 | ||
| std::vector<shape::dynamic_dimension> out_lens = {}; | ||
| out_lens.push_back({0, max_num_boxes}); | ||
| out_lens.push_back({3, 3}); | ||
| return {shape::int64_type, out_lens}; |
There was a problem hiding this comment.
With this change, compute_shape() returns the same dynamic output shape in both the use_dyn_output and !use_dyn_output branches, and compute() also reshapes unconditionally. That makes the use_dyn_output flag behavior misleading (it now mainly controls whether dynamic inputs are allowed). Consider simplifying the branching / documenting this explicitly (or deprecating the flag) so callers don’t assume it controls whether the output shape is dynamic.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
with default=false for API parity with parse_onnx()
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // actual selected boxes. Previously returned fixed max_num_boxes which caused zero | ||
| // padding bug. https://github.com/ROCm/AMDMIGraphX/issues/4679 | ||
| std::vector<shape::dynamic_dimension> out_lens = {}; | ||
| out_lens.push_back({0, max_num_boxes}); | ||
| out_lens.push_back({3, 3}); | ||
| return {shape::int64_type, out_lens}; |
There was a problem hiding this comment.
The output max length used for shape inference/allocation is derived from max_num_boxes = num_batches * num_boxes, but compute_nms() iterates over num_classes and can select up to num_batches * num_classes * num_boxes indices (or * min(num_boxes, max_output_boxes_per_class)), so the current {0, max_num_boxes} upper bound can under-allocate result and make std::copy(selected_indices..., output.begin()) write past the end when num_classes > 1. Consider including scores' max num_classes in the bound (e.g., boxes_max_lens[0] * inputs.at(1).max_lens()[1] * boxes_max_lens[1]) so output_shape.max_lens() is always a safe upper bound.
| // make buffer of maximum size | ||
| shape max_output_shape = {output_shape.type(), output_shape.max_lens()}; | ||
| argument result{max_output_shape}; | ||
| std::size_t num_selected = 0; | ||
|
|
||
| std::size_t max_output_boxes_per_class = | ||
| (args.size() > 2) ? (args.at(2).at<std::size_t>()) : 0; | ||
| if(max_output_boxes_per_class == 0) | ||
| { | ||
| return result; | ||
| return result.reshape({output_shape.type(), {num_selected, 3}}); | ||
| } |
There was a problem hiding this comment.
compute() always allocates result with output_shape.max_lens() before reading max_output_boxes_per_class; when max_output_boxes_per_class == 0 this returns an empty {0,3} view but still pays the cost of allocating the max-size buffer. Consider checking max_output_boxes_per_class first and returning an empty argument directly (or moving the max-buffer allocation after the early-return) to avoid unnecessary allocations.
| std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>(), | ||
| py::arg("skip_unknown_operators") = false, | ||
| py::arg("print_program_on_error") = false, | ||
| py::arg("external_data_path") = ""); | ||
| py::arg("external_data_path") = "", | ||
| py::arg("use_dyn_output") = false); |
There was a problem hiding this comment.
In the parse_onnx_buffer binding, the first argument is the ONNX buffer (onnx_buffer in the lambda), but the exposed keyword name is still py::arg("filename") (see line 695). Since this function signature is being updated, consider renaming the argument keyword to something like onnx_buffer (or otherwise documenting why it differs) so Python call sites using keywords aren’t misled.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4773 +/- ##
===========================================
+ Coverage 92.32% 92.45% +0.12%
===========================================
Files 583 583
Lines 29334 29297 -37
===========================================
+ Hits 27082 27084 +2
+ Misses 2252 2213 -39
🚀 New features to boost your workflow:
|
Motivation
Fixes #4679
The
nonmaxsuppressionoperator was producing incorrect outputs whenuse_dyn_output=False. The output was padded with[0, 0, 0]zero triplets instead of being trimmed to only the actually selected boxes.In reality,
[0, 0, 0]is a valid box index (batch 0, class 0, box 0), making it appear as if box 0 was selected multiple times. This silently corrupted downstream operations like Top-K filtering, breaking ONNX spec compliance.Additionally:
use_dyn_outputwas never exposed in Python bindings, meaning all Python users were always hitting the code path with no way to opt outuse_dyn_output=Truefailed with static input shapes with error:parse: PARSE_EXPAND: dynamic input tensor with fixed dims input not supportedI tested the changes on a small test script
Before fix:
After fix:
Technical Details
Root Cause
In
compute(), the output was only trimmed tonum_selectedboxes whenuse_dyn_output=True. TheFalsepath returned the full pre-allocated buffer zero-filled tomax_num_boxesrows.Changes
src/include/migraphx/op/nonmaxsuppression.hppstd::fillzero padding incompute_nms()- no longer needed since output is always trimmedcompute()to always trim output tonum_selectedboxes regardless ofuse_dyn_outputflag value — bothTrueandFalsenow produce correct ONNX spec compliant outputcompute_shape()else branch to return dynamic output shape for static inputs souse_dyn_output=Trueworks correctly with static input shapes without PARSE_EXPAND errorsrc/py/migraphx_py.cppuse_dyn_outputparameter toparse_onnx()Python bindingso Python users can now explicitly control this option
(defaults to
falsefor backward compatibility)test/ref/nonmaxsuppression.cpp(Test suite passes)removed 0 padding from expected outputs:
nms_testnms_not_center_testnms_transpose1_testnms_transpose2_testnms_static_input_dyn_out_false_test— main bug fix validationnms_static_input_dyn_out_true_test— static shapes + True worksnms_dyn_out_false_and_true_same_result_test— both flags produce identical correct outputnms_dynamic_input_dyn_out_false_throws_test— dynamic inputs withuse_dyn_output=Falsecorrectly throws errorFiles Changed
src/include/migraphx/op/nonmaxsuppression.hppsrc/py/migraphx_py.cppuse_dyn_outputin Pythontest/ref/nonmaxsuppression.cppChangelog Category
have been resolved.