|
| 1 | +# Features To Add |
| 2 | + |
| 3 | +## Verification Audit Recommendations |
| 4 | + |
| 5 | +Recommendations from a verification audit of all 52 benchmark verify functions. |
| 6 | + |
| 7 | +## Priority 1: Strengthen Filter Tests with Non-Uniform Input |
| 8 | + |
| 9 | +All 8 filter verify functions use constant-value input (all pixels = 100), making them unable to distinguish a working filter from a simple copy or no-op. |
| 10 | + |
| 11 | +**Affected benchmarks:** Box3x3, Gaussian3x3, Median3x3, Erode3x3, Dilate3x3, Sobel3x3, CustomConvolution, NonLinearFilter |
| 12 | + |
| 13 | +**Recommended fixes:** |
| 14 | +- **Box3x3** — Input with a single bright pixel (255) surrounded by zeros. Output center should be ~28 (255/9). |
| 15 | +- **Gaussian3x3** — Input with a single bright pixel. Output center should be less than 255 due to Gaussian weighting. |
| 16 | +- **Median3x3** — Input with salt-and-pepper noise. Output should be smoother than input. |
| 17 | +- **Erode3x3** — Input with an isolated bright pixel in a dark field. Erode should remove it (output = 0 at that position). |
| 18 | +- **Dilate3x3** — Input with an isolated dark pixel in a bright field. Dilate should fill it (output = 255 at that position). |
| 19 | +- **Sobel3x3** — Input with a horizontal edge (top half = 0, bottom half = 255). Verify dy gradient is non-zero at the edge. |
| 20 | +- **CustomConvolution** — Use a non-identity kernel (e.g., edge-detect) and verify output differs from input. |
| 21 | +- **NonLinearFilter** — Use a pattern where min/median/max produce distinct, verifiable results. |
| 22 | + |
| 23 | +## Priority 2: Use Non-Identity Geometric Transforms |
| 24 | + |
| 25 | +WarpAffine, WarpPerspective, and Remap all use identity transforms, so output trivially equals input. A copy operation would pass. |
| 26 | + |
| 27 | +**Affected benchmarks:** WarpAffine, WarpPerspective, Remap |
| 28 | + |
| 29 | +**Recommended fixes:** |
| 30 | +- **WarpAffine** — Use a known translation (e.g., shift by 10 pixels) and verify the pixel value appears at the expected offset. |
| 31 | +- **WarpPerspective** — Use a known simple perspective transform and verify pixel displacement. |
| 32 | +- **Remap** — Use a coordinate mapping that flips or shifts the image and verify output positions. |
| 33 | + |
| 34 | +## Priority 3: Verify Feature Detector Output |
| 35 | + |
| 36 | +HarrisCorners, FastCorners, and OpticalFlowPyrLK only check that graph execution succeeds without verifying detected features. |
| 37 | + |
| 38 | +**Affected benchmarks:** HarrisCorners, FastCorners, OpticalFlowPyrLK |
| 39 | + |
| 40 | +**Recommended fixes:** |
| 41 | +- **HarrisCorners** — Use a checkerboard or cross pattern with obvious corners. Verify the output array is non-empty. |
| 42 | +- **FastCorners** — Same approach. Verify at least one corner is detected on a known pattern. |
| 43 | +- **OpticalFlowPyrLK** — Verify that tracked keypoint positions shift in the expected direction between frames. |
| 44 | + |
| 45 | +## Priority 4: Multi-Pixel Sampling for Single-Pixel Checks |
| 46 | + |
| 47 | +Several tests only check a single output pixel. A bug affecting other regions would go undetected. |
| 48 | + |
| 49 | +**Affected benchmarks:** ChannelExtract, ChannelCombine, Phase, ScaleImage_Half, ScaleImage_Double |
| 50 | + |
| 51 | +**Recommended fixes:** |
| 52 | +- Sample at least 3-4 positions (e.g., center, corners, mid-edges) to verify the operation is consistent across the image. |
| 53 | + |
| 54 | +## Priority 5: Strengthen Remaining Weak Checks |
| 55 | + |
| 56 | +- **LBP** — Currently only checks `imageNonZero`. Should verify specific LBP pattern values for a known input. |
| 57 | +- **EqualizeHist** — Currently checks all pixels are equal +/-1. Could additionally verify the output value matches the expected equalized level for uniform input (should map to ~128 for full-range equalization). |
| 58 | + |
| 59 | +## Comparison Report Enhancements |
| 60 | + |
| 61 | +Features implemented in the polished comparison report (both C++ `--compare` and Python `compare_reports.py`): |
| 62 | + |
| 63 | +### Implemented |
| 64 | + |
| 65 | +- **System info section** — Shows CPU, cores, RAM, OS. Detects same vs different hardware with a mismatch warning. |
| 66 | +- **Conformance & Scores table** — Side-by-side Vision Score (geometric mean MP/s), conformance PASS/FAIL with kernel counts. |
| 67 | +- **Category sub-scores** — Per-category geometric mean comparison with % change column. |
| 68 | +- **Summary with per-category breakdown** — Regression/improvement/unchanged counts, broken down by category (e.g., "3 regressions in filters"). |
| 69 | +- **Detailed results with MP/s** — Both median latency (ms) and throughput (MP/s) for each implementation, plus change % and status. |
| 70 | +- **Benchmarks only in one report** — Lists benchmarks present in one file but not the other, so nothing is silently dropped. |
| 71 | +- **Stability caveat flags** — Marks rows where either side had CV% > 15%, with a footnote explaining unreliable comparisons. |
| 72 | +- **CSV output from C++** — Generates both `.md` and `.csv` from the C++ `--compare` path. |
| 73 | +- **Vision Score from JSON** — Python script reads precomputed `overall_vision_score` from JSON instead of incorrectly summing MP/s. |
| 74 | +- **Missing kernels detail** — Shows missing kernel lists side by side when conformance differs. |
| 75 | + |
| 76 | +### Future Enhancements |
| 77 | + |
| 78 | +- **Configurable regression threshold** — The 5% threshold for regression/improvement is hardcoded. Add a `--threshold` CLI option to both C++ and Python. |
| 79 | +- **Statistical significance testing** — When iterations > 1, perform confidence interval or t-test analysis to determine if differences are statistically meaningful. |
| 80 | +- **Multi-resolution scaling comparison** — Compare scaling efficiency between implementations (how well each handles higher resolutions). |
| 81 | +- **Chart/graph output** — Generate bar charts or SVG visualizations for throughput comparison. |
| 82 | +- **N-way comparison** — Support comparing 3+ implementations in a single report (currently optimized for pairwise). |
| 83 | +- **Grouped-by-category view** — Option to group the detailed results table by category instead of sorting by change %. |
| 84 | +- **Historical trend tracking** — Compare against a series of reports over time to detect gradual regressions. |
0 commit comments