Skip to content

# Remove unused _input_columns method from Linker #2845

@RobinL

Description

@RobinL

Summary

The _input_columns method in splink/internals/linker.py (lines 186-245) is only used in one place and can be replaced with simpler existing code.

Current Usage

The method is only called in find_blocking_rules_below_threshold_comparison_count():

column_expressions = linker._input_columns(
    include_unique_id_col_names=False,
    include_additional_columns_to_retain=False,
)

Problem

  1. The docstring says it should use "columns used by the ComparisonLevels" but _input_columns returns all input dataframe columns
  2. _settings_obj._columns_used_by_comparisons already exists and does exactly what the docstring describes
  3. The 60-line method is overly complex for what's needed

Proposed Fix

Replace lines 247-250 in find_brs_with_comparison_counts_below_threshold.py:

if column_expressions is None:
    column_expressions = linker._settings_obj._columns_used_by_comparisons

Then delete the _input_columns method from linker.py.

Note

_columns_used_by_comparisons returns List[str] directly, eliminating the need for the InputColumn to string conversion loop that follows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions