Skip to content

match: expose matched index pairs as NearestNeighborMatch.matched_indexes_ (#621)#897

Open
jbbqqf wants to merge 1 commit intouber:masterfrom
jbbqqf:feat/621-matched-indexes-attribute
Open

match: expose matched index pairs as NearestNeighborMatch.matched_indexes_ (#621)#897
jbbqqf wants to merge 1 commit intouber:masterfrom
jbbqqf:feat/621-matched-indexes-attribute

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Proposed changes

Expose the matched index pairs from NearestNeighborMatch.match() as a fitted
attribute matched_indexes_, addressing the request in #621. The attribute is
a two-column pandas.DataFrame (from, to) where each row corresponds to
one matched pair. Useful for joining matched pairs back to upstream metadata
or auditing the matching outcome without re-running the algorithm.

Fixes #621Add matched indexes as NearestNeighborMatch class attribute

Types of changes

  • New feature (non-breaking change which adds functionality)

Context

The match() method already computes the (from_idx_matched, to_idx_matched)
pairs internally before joining them onto the data — but the local variables
were discarded after return. The user reports they need access to the pair
mapping, e.g. psm.matched_indexes_ after psm.match(...). This change
captures the full pair table (before the from-side de-duplication that the
existing return value does) so a single from index can appear multiple
times against distinct to indices when ratio > 1.

Changes

  • causalml/match.py
    • NearestNeighborMatch.match(): capture the full (from, to) pair
      table for both the replace=True (NearestNeighbors) and replace=False
      (caliper-loop) paths, then assign it to self.matched_indexes_ before
      returning. Inline comments explain why the pair table is captured
      pre-dedup (the existing return value de-duplicates the from-side, which
      loses the row-level pairing under ratio > 1).
    • Class docstring updated with the new fitted attribute.
  • tests/test_match.py
    • New regression test test_nearest_neighbor_match_exposes_matched_indexes
      covering both branches. It fails on master with
      AttributeError: ... has no attribute 'matched_indexes_' and passes on
      this branch.

Reproduce BEFORE/AFTER yourself (copy-paste)

# --- one-time setup ---
git clone https://github.com/uber/causalml.git /tmp/repro-621 && cd /tmp/repro-621
python -m venv .venv && source .venv/bin/activate
pip install -e '.[test]'

# --- BEFORE (origin/master) ---
git checkout origin/master
python - <<'PY'
import numpy as np, pandas as pd
from causalml.match import NearestNeighborMatch
np.random.seed(0)
df = pd.DataFrame({"ps": np.random.rand(50)})
df["treatment"] = (np.arange(50) < 20).astype(int)
psm = NearestNeighborMatch(replace=True, ratio=2, random_state=0)
matched = psm.match(df, treatment_col="treatment", score_cols=["ps"])
print("matched rows:", len(matched))
print("has matched_indexes_:", hasattr(psm, "matched_indexes_"))
PY
# Expected: matched rows: <int>, has matched_indexes_: False

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/causalml.git feat/621-matched-indexes-attribute
git checkout FETCH_HEAD
pip install -e '.[test]'  # re-build Cython extensions
python - <<'PY'
import numpy as np, pandas as pd
from causalml.match import NearestNeighborMatch
np.random.seed(0)
df = pd.DataFrame({"ps": np.random.rand(50)})
df["treatment"] = (np.arange(50) < 20).astype(int)
psm = NearestNeighborMatch(replace=True, ratio=2, random_state=0)
matched = psm.match(df, treatment_col="treatment", score_cols=["ps"])
print("matched rows:", len(matched))
print("has matched_indexes_:", hasattr(psm, "matched_indexes_"))
print(psm.matched_indexes_.head())
print("from-counts (max):", psm.matched_indexes_["from"].value_counts().max())
PY
# Expected: matched rows: <int>, has matched_indexes_: True, prints a (from, to) DataFrame, max from-count >= 2

What I ran locally

  • pytest tests/test_match.py -v5/5 passed (was 4/4 before;
    added 1 regression test).
  • pytest tests/test_match.py::test_nearest_neighbor_match_exposes_matched_indexes
    on origin/masterFAIL (AttributeError: ... no attribute 'matched_indexes_').
  • black --fast causalml/match.py tests/test_match.py → clean (Black 26.x
    with --fast is required because the kit's runtime is Python 3.13 while
    Black 26 targets py314 by default; the formatting is identical).

Edge cases tested

# Scenario Verified by
1 replace=True, ratio=2 (NearestNeighbors path) matched_indexes_ populated; from-count.max() ≥ 2 confirms pair-level granularity
2 replace=False, ratio=1 (caliper-loop path) matched_indexes_ populated with correct schema
3 Indices align with returned matched DataFrame set(matched_indexes_['from']) and ['to'] are subsets of matched.index

Risk / blast radius

Purely additive: a new public attribute. Existing return value is unchanged,
so any consumer relying on match()'s current behavior is unaffected. The
attribute is a DataFrame that holds at most the same number of rows as the
returned matched DataFrame, so memory overhead is negligible.

Release note

Expose matched index pairs from `NearestNeighborMatch.match()` via the new
`matched_indexes_` fitted attribute (a two-column DataFrame of `from`, `to`).

PR drafted with assistance from Claude Code. The change was reviewed manually
against causalml/match.py and the existing tests/test_match.py patterns.
The reproducer block above was used during development; it is the same one a
reviewer can paste verbatim.

…exes_ (uber#621)

Capture the (from, to) pair mapping computed inside ``match()`` and
expose it as a fitted attribute, so downstream callers can join matched
pairs back onto upstream metadata and audit the matching outcome
without re-running the algorithm. The attribute is a two-column
DataFrame (``from``, ``to``) populated in both the replacement
(NearestNeighbors) and no-replacement (loop) branches.

Adds a regression test covering replace=True/ratio=2 and replace=False
that fails on master with AttributeError and passes on this branch.

Co-Authored-By: Claude Code <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jeongyoonlee
Copy link
Copy Markdown
Collaborator

Thanks for your contribution @jbbqqf. Can you sign the CLA? We won't accept any changes without it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add matched indexes as NearestNeighborMatch class attribute

3 participants