Found by the hypothesis property suite (#778) on its first run. Shrunk counterexample:
from goldenmatch.core.scorer import score_field
score_field("0000", "000000", "dice")
# ValueError: operands could not be broadcast together with shapes (2,) (3,)
Root cause: _dice_score_single / _jaccard_score_single in goldenmatch/core/scorer.py decode the two hex strings to byte arrays and call np.bitwise_and(bits_a, bits_b) with no length validation. The matrix variants (_dice_score_matrix / _jaccard_score_matrix) pad to max_len and are unaffected -- so the single-pair and batch paths disagree on the same inputs.
Reachability: bloom encodings are fixed-length per PPRL config, so same-pipeline pairs are safe in practice. But score_field is public API, and cross-config or hand-fed inputs hit an unhandled numpy internals error instead of either a score or a typed rejection.
Suggested fix: zero-pad the shorter array to max_len in the single-pair helpers, mirroring the matrix variants (keeps single-vs-matrix parity on the same inputs). Alternative if padding is semantically wrong for PPRL: raise a typed ValueError("bloom filter length mismatch: ...") -- but then the matrix variants should reject too, not silently pad.
Regression tests already in place: tests/test_property_invariants.py::test_dice_mismatched_length_bug and ::test_jaccard_mismatched_length_bug are @pytest.mark.xfail(strict=True, raises=ValueError) -- they flip to XPASS (suite failure) when this is fixed; remove the markers and keep the assertions.
Related: check the TS twins diceCoefficient / jaccardSimilarity (packages/typescript/goldenmatch/src/core/scorer.ts) for the same gap -- the fast-check suite (#783) tests them on a same-length hex strategy, so their mismatched-length behavior is currently unverified. Whatever semantics the fix picks, both surfaces should match.
Found by the hypothesis property suite (#778) on its first run. Shrunk counterexample:
Root cause:
_dice_score_single/_jaccard_score_singleingoldenmatch/core/scorer.pydecode the two hex strings to byte arrays and callnp.bitwise_and(bits_a, bits_b)with no length validation. The matrix variants (_dice_score_matrix/_jaccard_score_matrix) pad tomax_lenand are unaffected -- so the single-pair and batch paths disagree on the same inputs.Reachability: bloom encodings are fixed-length per PPRL config, so same-pipeline pairs are safe in practice. But
score_fieldis public API, and cross-config or hand-fed inputs hit an unhandled numpy internals error instead of either a score or a typed rejection.Suggested fix: zero-pad the shorter array to
max_lenin the single-pair helpers, mirroring the matrix variants (keeps single-vs-matrix parity on the same inputs). Alternative if padding is semantically wrong for PPRL: raise a typedValueError("bloom filter length mismatch: ...")-- but then the matrix variants should reject too, not silently pad.Regression tests already in place:
tests/test_property_invariants.py::test_dice_mismatched_length_bugand::test_jaccard_mismatched_length_bugare@pytest.mark.xfail(strict=True, raises=ValueError)-- they flip to XPASS (suite failure) when this is fixed; remove the markers and keep the assertions.Related: check the TS twins
diceCoefficient/jaccardSimilarity(packages/typescript/goldenmatch/src/core/scorer.ts) for the same gap -- the fast-check suite (#783) tests them on a same-length hex strategy, so their mismatched-length behavior is currently unverified. Whatever semantics the fix picks, both surfaces should match.