Skip to content

Dead code: ~28 unused methods/functions safe to remove (lucxor + phosphors), plus uncalled public-API & cleanup follow-ups #43

@timosachsenberg

Description

@timosachsenberg

Summary

A systematic dead-code audit found ~28 unused methods/functions (~960 LOC) that are safe to remove, plus a set of uncalled public-API methods and minor cleanups that need a maintainer decision. Findings were cross-checked with multiple static tools, per-symbol call-tracing, an adversarial "try to prove it's reachable" pass, and an empirical deletion test (delete everything in Tier 1, run the full suite → all tests still pass).

The audit also surfaced dead code that name-based tools (vulture, grep) cannot find on their own — e.g. Peak.to_dict, which looks "used" only because an unrelated class (LucXorConfig) has a method of the same name.

Methodology (how "certain" was established)

  1. vulture (--min-confidence 60) over onsite/ and onsite/ + tests/.
  2. Custom AST reference analysis — every function/method definition cross-referenced against every attribute-access, bareword, string literal, and getattr/setattr/hasattr string argument across onsite/ + tests/. Confirmed each Tier-1 symbol has zero references.
  3. pycg call-graph was attempted but is broken on Python 3.12 (ImportManagerError on even a trivial file), so it was not usable here — agent call-tracing + the deletion test substitute for it.
  4. Per-symbol classification + an independent adversarial refutation pass (one skeptic per candidate, instructed to find any reachability: dynamic dispatch, CLI entry points, public-API export, polymorphism, serialization hooks, threading targets, docs/examples). This correctly rescued several candidates (see Tier 2/2b).
  5. False-negative hunt for code the name-matching missed: same-name collisions across classes, whole dead subsystems, and receiver-type mismatches.
  6. Empirical deletion test: in a throwaway worktree, delete the entire Tier-1 set and run the full test suite on the real data/1.mzML.

Scope of "certain": Tier 1 is verified to have no internal caller and no dynamic/entry-point reachability, and its removal leaves the suite green. The receiver-mismatch check covered intra-onsite collisions and the low-reference subset; a method whose name also happens to be an attribute on an external (pyopenms/numpy/dict) object outside that subset is not exhaustively excluded.


Tier 1 — Confirmed dead, safe to remove

Empirically verified: deleting all of the below and running the full suite → 178 passed in 244.78s (identical to the 178-pass baseline on the unmodified tree).

Whole dead subsystems

onsite/lucxor/parallel.py — only parallel_psm_processing, PSMProcessingWorker, and get_optimal_thread_count are reachable (imported by cli.py). The rest is never instantiated or called anywhere:

  • class ScoringWorker (process_psms, score_peptide)
  • class NormalDensityWorker (process_all, calculate_density)
  • class ModelParameterWorker (process_all, calculate_parameters)
  • class SpectrumMatchingWorker (match_spectrum_peptide, process_psm_batch)
  • function parallel_process
  • function parallel_spectrum_matching

onsite/lucxor/globals.py — the entire globals dataclass is dead; only the module-level function get_decoy_symbol is imported elsewhere:

  • globals.init_globals, globals.record_flr_estimates, globals.assign_flr, globals.clear_psms (+ the class's real_psms/decoy_psms/flr_estimate_map fields). (record_flr_estimates is a false-negative for name tools — the live one is FLRCalculator.record_flr_estimates.)

onsite/lucxor/mass_provider.py — unused mass helpers (the live API is get_modification_mass / get_phospho_*):

  • get_residue_mass, get_residue_mass_fast, get_mass_array

Individual dead methods/functions

File Symbol Note
lucxor/flr.py FLRCalculator.normal_density superseded by inline vectorized KDE in eval_tick_marks
lucxor/flr.py FLRCalculator.get_local_auc docstring says "kept for backwards compatibility"; no caller
lucxor/peak.py Peak.from_dict Peak is always built via constructor
lucxor/peak.py Peak.to_dict false-negative — looked "used" because LucXorConfig.to_dict is the live one
lucxor/peptide.py Peptide._has_decoy_symbols
lucxor/peptide.py Peptide._find_closest_peak
lucxor/peptide.py Peptide._log_gaussian_prob
lucxor/psm.py PSM._extract_scan_number
lucxor/psm.py PSM._get_modified_peptide
lucxor/psm.py PSM._validate_permutation
lucxor/psm.py PSM._kill_thread_results
lucxor/psm.py PSM._calc_theoretical_masses
lucxor/spectrum.py Spectrum.find_index_by_mz
lucxor/spectrum.py Spectrum.find_peaks_in_range
phosphors/phosphors.py _generate_isomer_profiles
phosphors/phosphors.py _expected_fragment_mzs
phosphors/phosphors.py get_occurrence_probability
phosphors/phosphors.py calculate_phosphors_score

Tier 2 — Public-API surface, no internal caller (maintainer decision)

These are public methods on classes the published pyonsite library exports (CoreProcessor/LucXor, PSM, Peptide, PyLuciPHOr2 via onsite/lucxor/__init__.py __all__). They have no internal caller, so they are either intended external API or candidates for removal. The deletion test does not settle this (it passes either way). Recommend: document & keep, or remove per design intent.

  • cli.pyPyLuciPHOr2.initialize_model (duplicates the inline HCD/CID branch in run() at ~cli.py:879; could be DRY'd by having run() call it)
  • core.pyCoreProcessor.process_all_psms (documented with examples in docs/algorithms/lucxor.md), CoreProcessor.get_results, CoreProcessor.write_results
  • peptide.pyPeptide.get_precursor_mz, get_precursor_mass_pyopenms, calc_theoretical_masses, calc_score_cid, calc_score_hcd, is_decoy_pep
  • psm.pyPSM.from_peptide_id, generate_permutations_stage2, get_results, normalize_spectrum, reduce_nl_peak, get_spectrum_peaks, is_decoy_permutation

Tier 2b — reachable only via returned model instances

  • models.pyModelData_CID.clear_arrays, ModelData_HCD.percentile_trim. No internal caller, but reachable through instances returned by the exported CIDModel/HCDModel. Keep unless confirmed unused by downstream.

Tier 3 — Uncertain (human judgment)

No internal caller and not part of the public API surface, but the adversarial pass declined to mark them definitively dead. Likely dead; please confirm:

  • lucxor/flr.pyFLRCalculator.get_global_auc
  • lucxor/flr.pyFLRCalculator.assign_flr_from_mapping

Secondary cleanups (low risk)

Unused imports (vulture ≥90%): json (lucxor/cli.py:11), islice (lucxor/psm.py:11), ALGORITHM_CID/ALGORITHM_HCD (lucxor/models.py:16), AA_DECOY_MAP (lucxor/peptide.py:15; lucxor/psm.py:19), DECOY_AMINO_ACIDS/MIN_DELTA_SCORE/NEUTRAL_LOSSES (lucxor/psm.py:19), Peak1D (phosphors/phosphors.py:4).

Unused local variables (vulture 100%): min_threads (lucxor/parallel.py:287 — a kwarg get_optimal_thread_count never uses), add_ion_types / max_ion_charge (phosphors/phosphors.py:894-895).

Unused module constants: lucxor/constants.py defines many constants never read anywhere (e.g. ALGORITHM_CID/HCD, DALTONS, PPM_UNITS, PEPXML, TSV, the WRITE_* flags, PEPPROPHET, MASCOTIONSCORE, XCORR, MIN_DELTA_SCORE, ION_TYPES, NEUTRAL_LOSSES, SCORE_TYPES, WATER, PROTON, …). Worth a pass to prune.


Verification artifacts

  • Baseline (unmodified): 178 passed in ~256s.
  • After deleting the full Tier-1 set in a worktree: 178 passed in 244.78s (same data/1.mzML) — no regressions.
  • Static signals: vulture + custom AST reference analysis agreed on the zero-reference set; false-negative hunt added Peak.to_dict and globals.record_flr_estimates.

Note: build() in CIDModel/HCDModel and the to_dict/_build_charge_model/get_charge_model/etc. model methods were investigated as collision suspects and confirmed live (polymorphic dispatch), so they are intentionally not listed above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions