FIX: handle N-dim printed arrays in DTChecker (closes #21)#218
FIX: handle N-dim printed arrays in DTChecker (closes #21)#218MukundaKatta wants to merge 2 commits into
Conversation
The output checker's try_convert_printed_array helper only handled 1D and 2D printed numpy arrays. For 3D and higher, numpy inserts blank lines between sub-arrays, and the previous '[' splitting strategy produced malformed output that failed eval and broke the comparison. Switch to a regex pass that inserts commas between adjacent numeric tokens (handling negative/exp/nan/inf) and between adjacent ']' '[' brackets across any whitespace, so arbitrary-rank arrays round-trip. Refs scipy#21
ev-br
left a comment
There was a problem hiding this comment.
Thanks for the PR!
First I'll note that that #21 has been actually fixed by #216, and the remaining problem which you're addressing is specific to printed arrays: removal of <BLANKLINE> and reinserting commas don't play together, and fall into an infiinite recursion.
Thus the problem looks similar to #50
The added tests look good, but the fix itself I'm going to have to ask you to avoid regex parsing of numeric values. It'd be better to continue looking for square brackets and reinsert commas if want.startswith("[").
| _NUM_TOKEN = ( | ||
| r'[-+]?(?:\d+\.?\d*|\.\d+)(?:[eE][-+]?\d+)?[jJ]?' | ||
| r'|[-+]?(?:nan|inf|infinity)' | ||
| ) |
There was a problem hiding this comment.
Please no regex parsing of numeric values. That way lies --- quoting the original author of pytest-doctestplus checker --- byzantine complexity.
Would be great to keep try_convert_printed_arrays only looking at square brackets.
Per @ev-br review on scipy#218: avoid regex parsing of numeric values. Rewrite try_convert_printed_array as a single char-scan that only looks at square brackets and whitespace -- replacing each whitespace run with ', ' between values/sub-arrays and dropping it next to brackets. Handles N-dim arrays (incl. the blank lines numpy prints between sub-arrays) without recursion.
Summary
Closes #21.
DTChecker.try_convert_printed_arrayonly handled rank 1 and 2 arrays via bracket splitting, so doctests printing rank>=3 numpy arrays could not be compared.Fix
Replaced the bracket-splitting logic in
scipy_doctest/impl.pywith a regex pass that inserts commas between adjacent numeric tokens and between adjacent]/[pairs. Handles negatives, floats, exponents,nan/inf, any whitespace including blank lines between slabs. Works for arbitrary N-dim.Files changed
scipy_doctest/impl.py— new regex-based parserscipy_doctest/tests/module_cases.py— addedrank_3_printed_array()andrank_4_printed_array()doctest casesscipy_doctest/tests/test_ndim.py(new) — 10 unit tests covering 1D/2D/3D/4D, floats, negatives, tolerance vianp.allclose, and BLANKLINE reprTest plan
ruff format --checkandruff checkclean on changed code