match: expose matched index pairs as NearestNeighborMatch.matched_indexes_ (#621)#897
Open
jbbqqf wants to merge 1 commit intouber:masterfrom
Open
match: expose matched index pairs as NearestNeighborMatch.matched_indexes_ (#621)#897jbbqqf wants to merge 1 commit intouber:masterfrom
jbbqqf wants to merge 1 commit intouber:masterfrom
Conversation
…exes_ (uber#621) Capture the (from, to) pair mapping computed inside ``match()`` and expose it as a fitted attribute, so downstream callers can join matched pairs back onto upstream metadata and audit the matching outcome without re-running the algorithm. The attribute is a two-column DataFrame (``from``, ``to``) populated in both the replacement (NearestNeighbors) and no-replacement (loop) branches. Adds a regression test covering replace=True/ratio=2 and replace=False that fails on master with AttributeError and passes on this branch. Co-Authored-By: Claude Code <noreply@anthropic.com>
|
|
Collaborator
|
Thanks for your contribution @jbbqqf. Can you sign the CLA? We won't accept any changes without it. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Expose the matched index pairs from
NearestNeighborMatch.match()as a fittedattribute
matched_indexes_, addressing the request in #621. The attribute isa two-column
pandas.DataFrame(from,to) where each row corresponds toone matched pair. Useful for joining matched pairs back to upstream metadata
or auditing the matching outcome without re-running the algorithm.
Fixes #621 — Add matched indexes as NearestNeighborMatch class attribute
Types of changes
Context
The
match()method already computes the(from_idx_matched, to_idx_matched)pairs internally before joining them onto the data — but the local variables
were discarded after
return. The user reports they need access to the pairmapping, e.g.
psm.matched_indexes_afterpsm.match(...). This changecaptures the full pair table (before the
from-side de-duplication that theexisting return value does) so a single
fromindex can appear multipletimes against distinct
toindices whenratio > 1.Changes
causalml/match.pyNearestNeighborMatch.match(): capture the full(from, to)pairtable for both the
replace=True(NearestNeighbors) andreplace=False(caliper-loop) paths, then assign it to
self.matched_indexes_beforereturning. Inline comments explain why the pair table is captured
pre-dedup (the existing return value de-duplicates the from-side, which
loses the row-level pairing under
ratio > 1).tests/test_match.pytest_nearest_neighbor_match_exposes_matched_indexescovering both branches. It fails on
masterwithAttributeError: ... has no attribute 'matched_indexes_'and passes onthis branch.
Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
pytest tests/test_match.py -v→ 5/5 passed (was 4/4 before;added 1 regression test).
pytest tests/test_match.py::test_nearest_neighbor_match_exposes_matched_indexeson
origin/master→ FAIL (AttributeError: ... no attribute 'matched_indexes_').black --fast causalml/match.py tests/test_match.py→ clean (Black 26.xwith
--fastis required because the kit's runtime is Python 3.13 whileBlack 26 targets py314 by default; the formatting is identical).
Edge cases tested
replace=True, ratio=2(NearestNeighbors path)matched_indexes_populated; from-count.max() ≥ 2 confirms pair-level granularityreplace=False, ratio=1(caliper-loop path)matched_indexes_populated with correct schemaset(matched_indexes_['from'])and['to']are subsets ofmatched.indexRisk / blast radius
Purely additive: a new public attribute. Existing return value is unchanged,
so any consumer relying on
match()'s current behavior is unaffected. Theattribute is a
DataFramethat holds at most the same number of rows as thereturned matched DataFrame, so memory overhead is negligible.
Release note
PR drafted with assistance from Claude Code. The change was reviewed manually
against
causalml/match.pyand the existingtests/test_match.pypatterns.The reproducer block above was used during development; it is the same one a
reviewer can paste verbatim.