Skip to content

Commit 2659df3

Browse files
review: address Copilot review on PR #21 (16 comments)
Four themes, all surfaced by Copilot's code review on the v1.1 PR. 1. Timing-budget hygiene - move allocations out of run_fn (9) opencv_runner.h documents that setup_fn is the only place a benchmark may allocate (matching how OpenVX graphs pre-allocate at vxCreateImage / vxCreateTensor time). Several benches were violating that contract and timing per-iter cv::Mat::create, std::vector::reserve, and cv::HOGDescriptor construction: * cv_multiscale.cpp: GaussianPyramid_ORB, LaplacianPyramid_S16, LaplacianReconstruct, LaplacianReconstruct_S16 — per-level Mats now preallocated in shared state, residuals reused. * cv_extraction.cpp: HOGCells (HOGDescriptor in state), HOGFeatures (HOGDescriptor + reserved descriptors vector), HoughLinesP (reserved std::vector<Vec4i>), NonMaxSuppression (preallocated keep_mask, cv::compare in place). * cv_pipeline_vision.cpp::SobelMagnitudePhase and cv_pipeline_feature.cpp::ThresholdedEdge — Sobel direct to CV_32F so we skip the in-loop S16 to F32 convertTo, plus preallocated phase/magf/magu8 scratch in shared state. * cv_feature.cpp::OpticalFlowPyrLK — next_pts/status/err now reserved to DEFAULT_OPTFLOW_POINTS in setup_fn so the first per-iter push_back does not realloc. 2. Memory ceiling for HOGFeatures (2) cv::HOGDescriptor::compute slides a 64x64 window across the full image. Descriptor storage grows O(w*h) - at 4K thats 800 MB on the OpenCV side and ~420 MB int16 on the OpenVX side, enough to OOM CI runners and to dominate the kernel cost with allocator pressure. Capped both binaries effective HOG dims to 1024x768 (the classic HOG-pedestrian-detect resolution). Window count capped doesnt change the comparison answer because the per-window cost is what is being measured. 3. Correctness - TensorMatMul bias actually zero (1) The bias tensor was created with vxCreateTensor and described as "zero-filled" but never explicitly initialised. OpenVX does not guarantee fresh tensor memory is zero (impls may return uninit pages for perf), so on a strict impl the bias was effectively garbage and would perturb every matmul result. Fix: explicit vxCopyTensorPatch with a std::vector<int16_t>(M*M, 0) in setup_fn. Also fixed surrounding comment wording "M^2 fp16" to "M^2 int16" to match the actual VX_TYPE_INT16 storage. 4. Tidy - log-dedup tail flush + script robustness (3) * BenchmarkContext destructor calls resetLogDedup() so the trailing "(previous message repeated N more times)" line is always surfaced even when the last benchmark of a run ends in a suppressing-duplicates state. * compare_three_way.sh --skip-amd no longer breaks the OpenCV run. Previously the script ran opencv-mark from $BUILD_AMD/ opencv-mark/opencv-mark even when --skip-amd skipped that build entirely; now opencv-mark is built inside the rustVX tree (via -DOPENVX_MARK_BUILD_OPENCV=ON) when AMD is skipped, and opencv-mark is run from whichever build dir actually has it. * compare_three_way.sh now honours CARGO_TARGET_DIR for resolving the rustVX library path - mirrors the resolution logic already in build_rustvx.sh so the two stay in lockstep. Verified locally on macOS / OpenCV 4.13: * openvx-mark --category multiscale @ AMD MIVisionX: 9 pass, AMD-side S16 Laplacian rows still skip with the expected one-shot status=-14 log (verifies the destructor flush works). * opencv-mark --category multiscale,extraction: 15 pass, no OOM on HOGFeatures at FHD. * opencv-mark --category pipeline_vision,pipeline_feature,tensor: 15 pass, SobelMagnitudePhase/ThresholdedEdge measured cleanly with no in-loop allocations. CHANGELOG [Unreleased] block updated with the full per-fix rationale. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent e1b4298 commit 2659df3

10 files changed

Lines changed: 541 additions & 124 deletions

File tree

CHANGELOG.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,111 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66

77
## [Unreleased]
88

9+
### Fixed — PR #21 Copilot review pass
10+
11+
Addresses 16 review comments grouped into four themes:
12+
13+
#### Timing-budget hygiene — no allocations inside `run_fn` (9 fixes)
14+
15+
The opencv-mark runner contract (`include/opencv_runner.h`) requires
16+
`setup_fn` to allocate all buffers and `run_fn` to do kernel work only,
17+
so OpenCV timings are comparable to the OpenVX graphs that pre-allocate
18+
via `vxCreateImage` / `vxCreateTensor` at graph-construct time. Several
19+
benchmarks were violating that contract — each iteration was paying for
20+
`cv::Mat::create` / `std::vector::reserve` / `cv::HOGDescriptor`
21+
construction that should have happened once in `setup_fn`. Per-impl
22+
timings are now comparable to within timer noise.
23+
24+
- **`GaussianPyramid_ORB`** (`cv_multiscale.cpp`): per-level
25+
`blurred` / `downsampled` Mats now preallocated in shared state.
26+
- **`LaplacianPyramid_S16`** (`cv_multiscale.cpp`): per-level
27+
`down` / `up` / `diff` Mats preallocated.
28+
- **`LaplacianReconstruct`** + **`LaplacianReconstruct_S16`**
29+
(`cv_multiscale.cpp`): per-level `up` Mat + a shared
30+
`zero_residual` (sized to the largest level) preallocated.
31+
- **`HOGCells`** (`cv_extraction.cpp`): `cv::HOGDescriptor` instance
32+
captured in shared state, constructed once in `setup_fn`.
33+
- **`HOGFeatures`** (`cv_extraction.cpp`): `cv::HOGDescriptor` AND
34+
`std::vector<float> descriptors` captured in shared state.
35+
`descriptors` is reserved in `setup_fn` to its final length so
36+
`compute()`'s internal `resize()` stays inside the reservation.
37+
- **`HoughLinesP`** (`cv_extraction.cpp`): `std::vector<cv::Vec4i>
38+
lines` captured in shared state and reserved to 4096.
39+
- **`NonMaxSuppression`** (`cv_extraction.cpp`): `keep_mask` Mat
40+
preallocated; per-iter `(input >= input_extra)` Mat expression
41+
replaced with in-place `cv::compare(..., CMP_GE)`.
42+
- **`SobelMagnitudePhase`** (`cv_pipeline_vision.cpp`): drive
43+
`cv::Sobel` directly into `CV_32F` so the in-loop S16→F32
44+
`convertTo` allocations go away; `phase` scratch preallocated.
45+
- **`ThresholdedEdge`** (`cv_pipeline_feature.cpp`): same shape as
46+
`SobelMagnitudePhase` — Sobel direct to `CV_32F`, plus a
47+
preallocated `magf` (F32 magnitude) and `magu8` (U8 saturated)
48+
in shared state.
49+
- **`OpticalFlowPyrLK`** (`cv_feature.cpp`): per-iteration output
50+
vectors (`next_pts`, `status`, `err`) are now `reserve()`d to
51+
`DEFAULT_OPTFLOW_POINTS` in `setup_fn`. They were already
52+
cleared per iteration; `reserve()` ensures the first per-iter
53+
`push_back` doesn't realloc.
54+
55+
#### Memory ceiling for HOGFeatures (2 fixes)
56+
57+
`cv::HOGDescriptor::compute()` slides the configured window across
58+
the full image and produces one descriptor per slide — descriptor
59+
storage grows ~`O(w·h)`. At 4K it's ~800 MB on the OpenCV side and
60+
~420 MB of `int16` tensor on the OpenVX side, large enough to OOM
61+
CI runners and to dominate the actual kernel cost with allocator
62+
pressure.
63+
64+
- **openvx-mark `HOGFeatures`** (`src/benchmarks/node_extraction.cpp`):
65+
effective input dims capped at 1024×768 (the classic
66+
HOG-pedestrian-detect resolution) — yields a ~36 MB `int16`
67+
feature tensor instead of 420 MB at 4K.
68+
- **opencv-mark `HOGFeatures`** (`cv_extraction.cpp`): same 1024×768
69+
cap applied to keep the float `descriptors` vector ≤ 80 MB.
70+
71+
The per-window cost is what the benchmark measures, so capping window
72+
count doesn't change what the cross-impl comparison answers.
73+
74+
#### Correctness — TensorMatMul bias actually zero (1 fix)
75+
76+
`TensorMatMul` (`src/benchmarks/node_tensor.cpp`) was passing a
77+
freshly-created `vx_tensor` as the bias input and claiming in the
78+
comment it was "zero-filled". OpenVX does **not** guarantee
79+
freshly-created tensors are zero-initialised — impls are free to
80+
return uninitialised pages for perf. Without an explicit write,
81+
the bias was effectively `garbage`, which would perturb the matmul
82+
output and break the verify path's cross-impl equivalence check.
83+
84+
Fix: explicit `vxCopyTensorPatch(bias, ..., zeros, VX_WRITE_ONLY, ...)`
85+
in `setup_fn` so every impl actually sees zeros in the bias tensor.
86+
Also corrected the surrounding comment: "M² fp16" → "M² int16" to
87+
match the actual `VX_TYPE_INT16` storage.
88+
89+
#### Tidy — log-dedup tail flush + script robustness (3 fixes)
90+
91+
- **`BenchmarkContext` destructor now calls `resetLogDedup()`**
92+
(`src/benchmark_context.cpp`). If the last benchmark of a run
93+
ended with the log callback in a "suppressing duplicates" state,
94+
the trailing `(previous message repeated N more times)` line
95+
would never be emitted and the user would lose the tail of the
96+
driver's diagnostic signal. The destructor flush guarantees the
97+
count is always surfaced.
98+
- **`compare_three_way.sh --skip-amd` no longer breaks the OpenCV
99+
run** (`scripts/compare_three_way.sh`). The script was running
100+
opencv-mark from `$BUILD_AMD/opencv-mark/opencv-mark` even when
101+
`--skip-amd` skipped the AMD configure/build entirely, so on a
102+
clean checkout `--skip-amd` failed with "binary not found". Fix:
103+
when `--skip-amd` is set, build opencv-mark inside the rustVX
104+
tree instead (toggle `-DOPENVX_MARK_BUILD_OPENCV=ON` there) and
105+
run opencv-mark from whichever build dir actually has it.
106+
- **`compare_three_way.sh` now honours `CARGO_TARGET_DIR`** for
107+
resolving the rustVX library path. `build_rustvx.sh` already
108+
supports the env var (IDEs / CI caches commonly redirect cargo
109+
output to a shared tree); the comparison script was hard-coding
110+
`$RUSTVX_SRC/target/release` and would fail with a misleading
111+
"library not found" message in those setups. The resolution
112+
logic now mirrors `build_rustvx.sh` exactly.
113+
9114
### Fixed — Enhanced-Vision FFI hardening (preempts strict-FFI segfaults)
10115

11116
- **`HoughLinesP` output array now uses `VX_TYPE_LINE_2D`** (the

opencv-mark/src/benchmarks/cv_extraction.cpp

Lines changed: 120 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
// INT16_MIN for S16).
5454

5555
#include "opencv_runner.h"
56+
#include <memory>
5657
#include <opencv2/imgproc.hpp>
5758
#include <opencv2/objdetect.hpp>
5859
#include <opencv2/core.hpp>
@@ -175,25 +176,40 @@ std::vector<OpenCVBenchmarkCase> registerCvExtractionBenchmarks() {
175176
// accumulation, although the binning happens later in OpenCV's
176177
// pipeline (during compute()). For benchmark purposes we time the
177178
// gradient step which dominates the per-pixel cost.
179+
//
180+
// The HOGDescriptor instance is captured in shared state and
181+
// constructed once in setup_fn. Constructing a fresh
182+
// cv::HOGDescriptor inside run_fn (the previous shape) walked
183+
// OpenCV's default-init code path on every iteration, which on
184+
// a busy bench is enough non-kernel overhead to bias the timing.
178185
{
186+
struct HogCellsState {
187+
cv::HOGDescriptor hog; // defaults: win 64x128, block 16x16, cell 8x8, 9 bins
188+
};
189+
auto state = std::make_shared<HogCellsState>();
190+
179191
OpenCVBenchmarkCase bc;
180192
bc.name = "HOGCells";
181193
bc.category = "extraction";
182194
bc.feature_set = "enhanced_vision";
183-
bc.setup_fn = [](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
195+
bc.setup_fn = [state](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
184196
// HOG window must be a multiple of cell (8x8) and ≥ 16x16.
185197
const uint32_t ew = std::max<uint32_t>(16, (w / 8) * 8);
186198
const uint32_t eh = std::max<uint32_t>(16, (h / 8) * 8);
187199
bufs.input = gen.makeU8(ew, eh);
188-
bufs.output.create(static_cast<int>(eh), static_cast<int>(ew), CV_32FC2); // mag
189-
bufs.output_extra.create(static_cast<int>(eh), static_cast<int>(ew), CV_8UC2); // angle bins
200+
bufs.output.create(static_cast<int>(eh), static_cast<int>(ew), CV_32FC2); // mag
201+
bufs.output_extra.create(static_cast<int>(eh), static_cast<int>(ew), CV_8UC2); // angle bins
202+
// Default-constructed HOGDescriptor lives in state; no
203+
// run_fn-side construction. NOTE: HOGDescriptor is not
204+
// thread-safe in OpenCV, but our runner is single-threaded
205+
// per case so this is fine.
206+
state->hog = cv::HOGDescriptor();
190207
return true;
191208
};
192-
bc.run_fn = [](CaseBuffers& bufs) {
193-
cv::HOGDescriptor hog; // defaults: win 64x128, block 16x16, cell 8x8, 9 bins
209+
bc.run_fn = [state](CaseBuffers& bufs) {
194210
// computeGradient signature: (img, grad, qangle, paddingTL, paddingBR)
195-
hog.computeGradient(bufs.input, bufs.output, bufs.output_extra,
196-
cv::Size(0, 0), cv::Size(0, 0));
211+
state->hog.computeGradient(bufs.input, bufs.output, bufs.output_extra,
212+
cv::Size(0, 0), cv::Size(0, 0));
197213
};
198214
bc.verify_fn = []() -> bool {
199215
cv::HOGDescriptor hog;
@@ -209,32 +225,70 @@ std::vector<OpenCVBenchmarkCase> registerCvExtractionBenchmarks() {
209225
}
210226

211227
// HOGFeatures — U8 input, F32 descriptor vector (full HOG pipeline).
228+
//
229+
// Two preallocation moves vs the original shape:
230+
// 1) cv::HOGDescriptor with the openvx-mark-matching parameters
231+
// is captured in shared state, not reconstructed per iter.
232+
// 2) std::vector<float> descriptors is also captured + reserved
233+
// to its final size in setup_fn so hog.compute()'s resize()
234+
// below stays inside the reserved capacity — no realloc in
235+
// the timed loop.
236+
//
237+
// Also: cap the effective input dimensions to 1024x768.
238+
// cv::HOGDescriptor::compute slides a 64x64 window with stride 8
239+
// across the full image, producing one descriptor per window. At
240+
// FHD that's ~30k windows × 1764 floats/win ≈ 50M floats ≈ 200 MB
241+
// of descriptors; at 4K ≈ 800 MB. Capping to 1024x768 (the
242+
// classic HOG-pedestrian-detect resolution) keeps the descriptors
243+
// vector ≤ ~80 MB while still being a meaningful workload — the
244+
// per-window cost is what's being measured, so window count
245+
// doesn't change the comparison answer.
212246
{
247+
struct HogFeaturesState {
248+
cv::HOGDescriptor hog{cv::Size(64, 64), // win
249+
cv::Size(16, 16), // block
250+
cv::Size(8, 8), // block stride
251+
cv::Size(8, 8), // cell
252+
9}; // nbins
253+
std::vector<float> descriptors;
254+
};
255+
auto state = std::make_shared<HogFeaturesState>();
256+
213257
OpenCVBenchmarkCase bc;
214258
bc.name = "HOGFeatures";
215259
bc.category = "extraction";
216260
bc.feature_set = "enhanced_vision";
217-
bc.setup_fn = [](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
261+
bc.setup_fn = [state](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
262+
// Cap rationale: see block comment above.
263+
constexpr uint32_t MAX_HOG_W = 1024;
264+
constexpr uint32_t MAX_HOG_H = 768;
265+
const uint32_t cw = std::min<uint32_t>(w, MAX_HOG_W);
266+
const uint32_t ch = std::min<uint32_t>(h, MAX_HOG_H);
218267
// Round up to a HOG window stride (8x8). cv::HOGDescriptor
219268
// defaults to a 64x128 window; we use 64x64 to match the
220269
// openvx-mark benchmark and feed an image that's at least
221270
// that big.
222-
const uint32_t ew = std::max<uint32_t>(64, (w / 8) * 8);
223-
const uint32_t eh = std::max<uint32_t>(64, (h / 8) * 8);
271+
const uint32_t ew = std::max<uint32_t>(64, (cw / 8) * 8);
272+
const uint32_t eh = std::max<uint32_t>(64, (ch / 8) * 8);
224273
bufs.input = gen.makeU8(ew, eh);
274+
275+
// Reserve the descriptors vector to the size compute() will
276+
// produce: getDescriptorSize() returns the per-window length,
277+
// and the number of windows = win_per_row × win_per_col
278+
// with stride (8,8) and no padding.
279+
const size_t per_win = state->hog.getDescriptorSize();
280+
const size_t wins_per_row = (ew >= 64) ? ((ew - 64) / 8 + 1) : 1;
281+
const size_t wins_per_col = (eh >= 64) ? ((eh - 64) / 8 + 1) : 1;
282+
state->descriptors.clear();
283+
state->descriptors.reserve(per_win * wins_per_row * wins_per_col);
225284
return true;
226285
};
227-
bc.run_fn = [](CaseBuffers& bufs) {
228-
// Match openvx-mark's HOGFeatures parameters:
229-
// window 64×64, block 16×16, block stride 8×8, cell 8×8, 9 bins
230-
cv::HOGDescriptor hog(cv::Size(64, 64), // win
231-
cv::Size(16, 16), // block
232-
cv::Size(8, 8), // block stride
233-
cv::Size(8, 8), // cell
234-
9); // nbins
235-
std::vector<float> descriptors;
236-
hog.compute(bufs.input, descriptors, cv::Size(8, 8), cv::Size(0, 0));
237-
(void)descriptors.size();
286+
bc.run_fn = [state](CaseBuffers& bufs) {
287+
// compute() resizes descriptors to the exact output length —
288+
// since we reserved to that exact size in setup_fn the
289+
// resize is a no-op (no realloc), so the timing measures
290+
// only the kernel work.
291+
state->hog.compute(bufs.input, state->descriptors, cv::Size(8, 8), cv::Size(0, 0));
238292
};
239293
bc.verify_fn = []() -> bool {
240294
cv::Mat in(64, 64, CV_8UC1, cv::Scalar(0));
@@ -250,29 +304,47 @@ std::vector<OpenCVBenchmarkCase> registerCvExtractionBenchmarks() {
250304
}
251305

252306
// HoughLinesP — U8 (binary) in, vector<Vec4i> lines out.
307+
//
308+
// The output lines vector is captured in shared state and reserved
309+
// to a sensible upper bound in setup_fn. Without this, every timed
310+
// call would land cv::HoughLinesP's first push_back inside the
311+
// measurement window (vector allocation + copies of any line
312+
// segments accumulated so far).
253313
{
314+
struct HoughState {
315+
std::vector<cv::Vec4i> lines;
316+
};
317+
auto state = std::make_shared<HoughState>();
318+
254319
OpenCVBenchmarkCase bc;
255320
bc.name = "HoughLinesP";
256321
bc.category = "extraction";
257322
bc.feature_set = "enhanced_vision";
258-
bc.setup_fn = [](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
323+
bc.setup_fn = [state](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
259324
bufs.input = gen.makeU8(w, h);
260325
// HoughLinesP wants a binary (edge) image; threshold the random
261326
// input so we get a meaningful set of edge pixels. Threshold
262327
// inside setup_fn so cv::HoughLinesP only times the Hough step
263328
// itself.
264329
cv::threshold(bufs.input, bufs.output_extra, 200, 255, cv::THRESH_BINARY);
330+
// 4096 is a generous cap for a random-edge image at any
331+
// resolution we exercise; the worst-case observed in
332+
// local runs is ~few hundred segments. Reserve once, reuse.
333+
state->lines.clear();
334+
state->lines.reserve(4096);
265335
return true;
266336
};
267-
bc.run_fn = [](CaseBuffers& bufs) {
268-
std::vector<cv::Vec4i> lines;
269-
cv::HoughLinesP(bufs.output_extra, lines,
337+
bc.run_fn = [state](CaseBuffers& bufs) {
338+
// clear() preserves capacity; HoughLinesP will append into
339+
// the reserved storage without realloc as long as the
340+
// detected line count stays under 4096.
341+
state->lines.clear();
342+
cv::HoughLinesP(bufs.output_extra, state->lines,
270343
/*rho=*/1.0,
271344
/*theta=*/CV_PI / 180.0,
272345
/*threshold=*/50,
273346
/*minLineLength=*/30,
274347
/*maxLineGap=*/10);
275-
(void)lines.size();
276348
};
277349
bc.verify_fn = []() -> bool {
278350
// Step image with a vertical white bar → at least one line found.
@@ -291,33 +363,49 @@ std::vector<OpenCVBenchmarkCase> registerCvExtractionBenchmarks() {
291363
// We compute local maxima using cv::dilate (max filter over 3x3),
292364
// then keep pixels equal to their local max and set the rest to
293365
// INT16_MIN.
366+
//
367+
// keep_mask was previously allocated by an in-loop Mat expression
368+
// (`bufs.input >= bufs.input_extra`) which allocates a fresh
369+
// CV_8UC1 the size of the image every iteration. Preallocate it
370+
// in shared state and fill via cv::compare to keep run_fn
371+
// allocation-free.
294372
{
373+
struct NmsState {
374+
cv::Mat keep_mask; // CV_8UC1, preallocated in setup_fn
375+
};
376+
auto state = std::make_shared<NmsState>();
377+
295378
OpenCVBenchmarkCase bc;
296379
bc.name = "NonMaxSuppression";
297380
bc.category = "extraction";
298381
bc.feature_set = "enhanced_vision";
299-
bc.setup_fn = [](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
382+
bc.setup_fn = [state](uint32_t w, uint32_t h, OpenCVTestData& gen, CaseBuffers& bufs) -> bool {
300383
bufs.input = gen.makeS16(w, h);
301384
bufs.input_extra.create(static_cast<int>(h), static_cast<int>(w), CV_16SC1);
302385
bufs.output.create(static_cast<int>(h), static_cast<int>(w), CV_16SC1);
386+
state->keep_mask.create(static_cast<int>(h), static_cast<int>(w), CV_8UC1);
303387
return true;
304388
};
305-
bc.run_fn = [](CaseBuffers& bufs) {
389+
bc.run_fn = [state](CaseBuffers& bufs) {
306390
static const cv::Mat se = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3, 3));
307391
// Local max via dilate; pixel kept iff input == local max.
308392
cv::dilate(bufs.input, bufs.input_extra, se,
309393
cv::Point(-1, -1), 1, cv::BORDER_REPLICATE);
310-
cv::Mat keep_mask = (bufs.input >= bufs.input_extra); // CV_8UC1 mask
311-
bufs.output.setTo(static_cast<int16_t>(-32768)); // INT16_MIN
312-
bufs.input.copyTo(bufs.output, keep_mask);
394+
// cv::compare writes into the preallocated mask in place —
395+
// no Mat allocation in the timed loop. CMP_GE = "input >=
396+
// input_extra" → 255 where input is a local max, else 0.
397+
cv::compare(bufs.input, bufs.input_extra, state->keep_mask, cv::CMP_GE);
398+
bufs.output.setTo(static_cast<int16_t>(-32768)); // INT16_MIN
399+
bufs.input.copyTo(bufs.output, state->keep_mask);
313400
};
314401
bc.verify_fn = []() -> bool {
315402
cv::Mat in(64, 64, CV_16SC1, cv::Scalar(10));
316403
in.at<int16_t>(32, 32) = 1000;
317404
cv::Mat dilated, out(64, 64, CV_16SC1, cv::Scalar(-32768));
318405
const cv::Mat se = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3, 3));
319406
cv::dilate(in, dilated, se, cv::Point(-1, -1), 1, cv::BORDER_REPLICATE);
320-
cv::Mat mask = (in >= dilated);
407+
cv::Mat mask;
408+
cv::compare(in, dilated, mask, cv::CMP_GE);
321409
in.copyTo(out, mask);
322410
// Center should keep its 1000 value.
323411
return out.at<int16_t>(32, 32) == 1000;

0 commit comments

Comments
 (0)