Skip to content

FIX: handle N-dim printed arrays in DTChecker (closes #21)#218

Draft
MukundaKatta wants to merge 2 commits into
scipy:mainfrom
MukundaKatta:feat/ndim-array-doctest
Draft

FIX: handle N-dim printed arrays in DTChecker (closes #21)#218
MukundaKatta wants to merge 2 commits into
scipy:mainfrom
MukundaKatta:feat/ndim-array-doctest

Conversation

@MukundaKatta

Copy link
Copy Markdown

Summary

Closes #21. DTChecker.try_convert_printed_array only handled rank 1 and 2 arrays via bracket splitting, so doctests printing rank>=3 numpy arrays could not be compared.

Fix

Replaced the bracket-splitting logic in scipy_doctest/impl.py with a regex pass that inserts commas between adjacent numeric tokens and between adjacent ]/[ pairs. Handles negatives, floats, exponents, nan/inf, any whitespace including blank lines between slabs. Works for arbitrary N-dim.

Files changed

  • scipy_doctest/impl.py — new regex-based parser
  • scipy_doctest/tests/module_cases.py — added rank_3_printed_array() and rank_4_printed_array() doctest cases
  • scipy_doctest/tests/test_ndim.py (new) — 10 unit tests covering 1D/2D/3D/4D, floats, negatives, tolerance via np.allclose, and BLANKLINE repr

Test plan

  • 71 passed, 1 pre-existing xfail
  • ruff format --check and ruff check clean on changed code

The output checker's try_convert_printed_array helper only handled 1D
and 2D printed numpy arrays. For 3D and higher, numpy inserts blank
lines between sub-arrays, and the previous '[' splitting strategy
produced malformed output that failed eval and broke the comparison.

Switch to a regex pass that inserts commas between adjacent numeric
tokens (handling negative/exp/nan/inf) and between adjacent ']' '['
brackets across any whitespace, so arbitrary-rank arrays round-trip.

Refs scipy#21

@ev-br ev-br left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

First I'll note that that #21 has been actually fixed by #216, and the remaining problem which you're addressing is specific to printed arrays: removal of <BLANKLINE> and reinserting commas don't play together, and fall into an infiinite recursion.
Thus the problem looks similar to #50

The added tests look good, but the fix itself I'm going to have to ask you to avoid regex parsing of numeric values. It'd be better to continue looking for square brackets and reinsert commas if want.startswith("[").

Comment thread scipy_doctest/impl.py Outdated
_NUM_TOKEN = (
r'[-+]?(?:\d+\.?\d*|\.\d+)(?:[eE][-+]?\d+)?[jJ]?'
r'|[-+]?(?:nan|inf|infinity)'
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no regex parsing of numeric values. That way lies --- quoting the original author of pytest-doctestplus checker --- byzantine complexity.

Would be great to keep try_convert_printed_arrays only looking at square brackets.

Per @ev-br review on scipy#218: avoid regex parsing of numeric values. Rewrite
try_convert_printed_array as a single char-scan that only looks at square
brackets and whitespace -- replacing each whitespace run with ', ' between
values/sub-arrays and dropping it next to brackets. Handles N-dim arrays
(incl. the blank lines numpy prints between sub-arrays) without recursion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

N-dim arrays with n> 2

2 participants