Summary
A systematic dead-code audit found ~28 unused methods/functions (~960 LOC) that are safe to remove, plus a set of uncalled public-API methods and minor cleanups that need a maintainer decision. Findings were cross-checked with multiple static tools, per-symbol call-tracing, an adversarial "try to prove it's reachable" pass, and an empirical deletion test (delete everything in Tier 1, run the full suite → all tests still pass).
The audit also surfaced dead code that name-based tools (vulture, grep) cannot find on their own — e.g. Peak.to_dict, which looks "used" only because an unrelated class (LucXorConfig) has a method of the same name.
Methodology (how "certain" was established)
- vulture (
--min-confidence 60) over onsite/ and onsite/ + tests/.
- Custom AST reference analysis — every function/method definition cross-referenced against every attribute-access, bareword, string literal, and
getattr/setattr/hasattr string argument across onsite/ + tests/. Confirmed each Tier-1 symbol has zero references.
- pycg call-graph was attempted but is broken on Python 3.12 (
ImportManagerError on even a trivial file), so it was not usable here — agent call-tracing + the deletion test substitute for it.
- Per-symbol classification + an independent adversarial refutation pass (one skeptic per candidate, instructed to find any reachability: dynamic dispatch, CLI entry points, public-API export, polymorphism, serialization hooks, threading targets, docs/examples). This correctly rescued several candidates (see Tier 2/2b).
- False-negative hunt for code the name-matching missed: same-name collisions across classes, whole dead subsystems, and receiver-type mismatches.
- Empirical deletion test: in a throwaway worktree, delete the entire Tier-1 set and run the full test suite on the real
data/1.mzML.
Scope of "certain": Tier 1 is verified to have no internal caller and no dynamic/entry-point reachability, and its removal leaves the suite green. The receiver-mismatch check covered intra-onsite collisions and the low-reference subset; a method whose name also happens to be an attribute on an external (pyopenms/numpy/dict) object outside that subset is not exhaustively excluded.
Tier 1 — Confirmed dead, safe to remove
Empirically verified: deleting all of the below and running the full suite → 178 passed in 244.78s (identical to the 178-pass baseline on the unmodified tree).
Whole dead subsystems
onsite/lucxor/parallel.py — only parallel_psm_processing, PSMProcessingWorker, and get_optimal_thread_count are reachable (imported by cli.py). The rest is never instantiated or called anywhere:
- class
ScoringWorker (process_psms, score_peptide)
- class
NormalDensityWorker (process_all, calculate_density)
- class
ModelParameterWorker (process_all, calculate_parameters)
- class
SpectrumMatchingWorker (match_spectrum_peptide, process_psm_batch)
- function
parallel_process
- function
parallel_spectrum_matching
onsite/lucxor/globals.py — the entire globals dataclass is dead; only the module-level function get_decoy_symbol is imported elsewhere:
globals.init_globals, globals.record_flr_estimates, globals.assign_flr, globals.clear_psms (+ the class's real_psms/decoy_psms/flr_estimate_map fields). (record_flr_estimates is a false-negative for name tools — the live one is FLRCalculator.record_flr_estimates.)
onsite/lucxor/mass_provider.py — unused mass helpers (the live API is get_modification_mass / get_phospho_*):
get_residue_mass, get_residue_mass_fast, get_mass_array
Individual dead methods/functions
| File |
Symbol |
Note |
lucxor/flr.py |
FLRCalculator.normal_density |
superseded by inline vectorized KDE in eval_tick_marks |
lucxor/flr.py |
FLRCalculator.get_local_auc |
docstring says "kept for backwards compatibility"; no caller |
lucxor/peak.py |
Peak.from_dict |
Peak is always built via constructor |
lucxor/peak.py |
Peak.to_dict |
false-negative — looked "used" because LucXorConfig.to_dict is the live one |
lucxor/peptide.py |
Peptide._has_decoy_symbols |
|
lucxor/peptide.py |
Peptide._find_closest_peak |
|
lucxor/peptide.py |
Peptide._log_gaussian_prob |
|
lucxor/psm.py |
PSM._extract_scan_number |
|
lucxor/psm.py |
PSM._get_modified_peptide |
|
lucxor/psm.py |
PSM._validate_permutation |
|
lucxor/psm.py |
PSM._kill_thread_results |
|
lucxor/psm.py |
PSM._calc_theoretical_masses |
|
lucxor/spectrum.py |
Spectrum.find_index_by_mz |
|
lucxor/spectrum.py |
Spectrum.find_peaks_in_range |
|
phosphors/phosphors.py |
_generate_isomer_profiles |
|
phosphors/phosphors.py |
_expected_fragment_mzs |
|
phosphors/phosphors.py |
get_occurrence_probability |
|
phosphors/phosphors.py |
calculate_phosphors_score |
|
Tier 2 — Public-API surface, no internal caller (maintainer decision)
These are public methods on classes the published pyonsite library exports (CoreProcessor/LucXor, PSM, Peptide, PyLuciPHOr2 via onsite/lucxor/__init__.py __all__). They have no internal caller, so they are either intended external API or candidates for removal. The deletion test does not settle this (it passes either way). Recommend: document & keep, or remove per design intent.
cli.py — PyLuciPHOr2.initialize_model (duplicates the inline HCD/CID branch in run() at ~cli.py:879; could be DRY'd by having run() call it)
core.py — CoreProcessor.process_all_psms (documented with examples in docs/algorithms/lucxor.md), CoreProcessor.get_results, CoreProcessor.write_results
peptide.py — Peptide.get_precursor_mz, get_precursor_mass_pyopenms, calc_theoretical_masses, calc_score_cid, calc_score_hcd, is_decoy_pep
psm.py — PSM.from_peptide_id, generate_permutations_stage2, get_results, normalize_spectrum, reduce_nl_peak, get_spectrum_peaks, is_decoy_permutation
Tier 2b — reachable only via returned model instances
models.py — ModelData_CID.clear_arrays, ModelData_HCD.percentile_trim. No internal caller, but reachable through instances returned by the exported CIDModel/HCDModel. Keep unless confirmed unused by downstream.
Tier 3 — Uncertain (human judgment)
No internal caller and not part of the public API surface, but the adversarial pass declined to mark them definitively dead. Likely dead; please confirm:
lucxor/flr.py — FLRCalculator.get_global_auc
lucxor/flr.py — FLRCalculator.assign_flr_from_mapping
Secondary cleanups (low risk)
Unused imports (vulture ≥90%): json (lucxor/cli.py:11), islice (lucxor/psm.py:11), ALGORITHM_CID/ALGORITHM_HCD (lucxor/models.py:16), AA_DECOY_MAP (lucxor/peptide.py:15; lucxor/psm.py:19), DECOY_AMINO_ACIDS/MIN_DELTA_SCORE/NEUTRAL_LOSSES (lucxor/psm.py:19), Peak1D (phosphors/phosphors.py:4).
Unused local variables (vulture 100%): min_threads (lucxor/parallel.py:287 — a kwarg get_optimal_thread_count never uses), add_ion_types / max_ion_charge (phosphors/phosphors.py:894-895).
Unused module constants: lucxor/constants.py defines many constants never read anywhere (e.g. ALGORITHM_CID/HCD, DALTONS, PPM_UNITS, PEPXML, TSV, the WRITE_* flags, PEPPROPHET, MASCOTIONSCORE, XCORR, MIN_DELTA_SCORE, ION_TYPES, NEUTRAL_LOSSES, SCORE_TYPES, WATER, PROTON, …). Worth a pass to prune.
Verification artifacts
- Baseline (unmodified): 178 passed in ~256s.
- After deleting the full Tier-1 set in a worktree: 178 passed in 244.78s (same
data/1.mzML) — no regressions.
- Static signals: vulture + custom AST reference analysis agreed on the zero-reference set; false-negative hunt added
Peak.to_dict and globals.record_flr_estimates.
Note: build() in CIDModel/HCDModel and the to_dict/_build_charge_model/get_charge_model/etc. model methods were investigated as collision suspects and confirmed live (polymorphic dispatch), so they are intentionally not listed above.
Summary
A systematic dead-code audit found ~28 unused methods/functions (~960 LOC) that are safe to remove, plus a set of uncalled public-API methods and minor cleanups that need a maintainer decision. Findings were cross-checked with multiple static tools, per-symbol call-tracing, an adversarial "try to prove it's reachable" pass, and an empirical deletion test (delete everything in Tier 1, run the full suite → all tests still pass).
The audit also surfaced dead code that name-based tools (vulture, grep) cannot find on their own — e.g.
Peak.to_dict, which looks "used" only because an unrelated class (LucXorConfig) has a method of the same name.Methodology (how "certain" was established)
--min-confidence 60) overonsite/andonsite/ + tests/.getattr/setattr/hasattrstring argument acrossonsite/ + tests/. Confirmed each Tier-1 symbol has zero references.ImportManagerErroron even a trivial file), so it was not usable here — agent call-tracing + the deletion test substitute for it.data/1.mzML.Tier 1 — Confirmed dead, safe to remove
Empirically verified: deleting all of the below and running the full suite → 178 passed in 244.78s (identical to the 178-pass baseline on the unmodified tree).
Whole dead subsystems
onsite/lucxor/parallel.py— onlyparallel_psm_processing,PSMProcessingWorker, andget_optimal_thread_countare reachable (imported bycli.py). The rest is never instantiated or called anywhere:ScoringWorker(process_psms,score_peptide)NormalDensityWorker(process_all,calculate_density)ModelParameterWorker(process_all,calculate_parameters)SpectrumMatchingWorker(match_spectrum_peptide,process_psm_batch)parallel_processparallel_spectrum_matchingonsite/lucxor/globals.py— the entireglobalsdataclass is dead; only the module-level functionget_decoy_symbolis imported elsewhere:globals.init_globals,globals.record_flr_estimates,globals.assign_flr,globals.clear_psms(+ the class'sreal_psms/decoy_psms/flr_estimate_mapfields). (record_flr_estimatesis a false-negative for name tools — the live one isFLRCalculator.record_flr_estimates.)onsite/lucxor/mass_provider.py— unused mass helpers (the live API isget_modification_mass/get_phospho_*):get_residue_mass,get_residue_mass_fast,get_mass_arrayIndividual dead methods/functions
lucxor/flr.pyFLRCalculator.normal_densityeval_tick_markslucxor/flr.pyFLRCalculator.get_local_auclucxor/peak.pyPeak.from_dictPeakis always built via constructorlucxor/peak.pyPeak.to_dictLucXorConfig.to_dictis the live onelucxor/peptide.pyPeptide._has_decoy_symbolslucxor/peptide.pyPeptide._find_closest_peaklucxor/peptide.pyPeptide._log_gaussian_problucxor/psm.pyPSM._extract_scan_numberlucxor/psm.pyPSM._get_modified_peptidelucxor/psm.pyPSM._validate_permutationlucxor/psm.pyPSM._kill_thread_resultslucxor/psm.pyPSM._calc_theoretical_masseslucxor/spectrum.pySpectrum.find_index_by_mzlucxor/spectrum.pySpectrum.find_peaks_in_rangephosphors/phosphors.py_generate_isomer_profilesphosphors/phosphors.py_expected_fragment_mzsphosphors/phosphors.pyget_occurrence_probabilityphosphors/phosphors.pycalculate_phosphors_scoreTier 2 — Public-API surface, no internal caller (maintainer decision)
These are public methods on classes the published
pyonsitelibrary exports (CoreProcessor/LucXor,PSM,Peptide,PyLuciPHOr2viaonsite/lucxor/__init__.py__all__). They have no internal caller, so they are either intended external API or candidates for removal. The deletion test does not settle this (it passes either way). Recommend: document & keep, or remove per design intent.cli.py—PyLuciPHOr2.initialize_model(duplicates the inline HCD/CID branch inrun()at ~cli.py:879; could be DRY'd by havingrun()call it)core.py—CoreProcessor.process_all_psms(documented with examples indocs/algorithms/lucxor.md),CoreProcessor.get_results,CoreProcessor.write_resultspeptide.py—Peptide.get_precursor_mz,get_precursor_mass_pyopenms,calc_theoretical_masses,calc_score_cid,calc_score_hcd,is_decoy_peppsm.py—PSM.from_peptide_id,generate_permutations_stage2,get_results,normalize_spectrum,reduce_nl_peak,get_spectrum_peaks,is_decoy_permutationTier 2b — reachable only via returned model instances
models.py—ModelData_CID.clear_arrays,ModelData_HCD.percentile_trim. No internal caller, but reachable through instances returned by the exportedCIDModel/HCDModel. Keep unless confirmed unused by downstream.Tier 3 — Uncertain (human judgment)
No internal caller and not part of the public API surface, but the adversarial pass declined to mark them definitively dead. Likely dead; please confirm:
lucxor/flr.py—FLRCalculator.get_global_auclucxor/flr.py—FLRCalculator.assign_flr_from_mappingSecondary cleanups (low risk)
Unused imports (vulture ≥90%):
json(lucxor/cli.py:11),islice(lucxor/psm.py:11),ALGORITHM_CID/ALGORITHM_HCD(lucxor/models.py:16),AA_DECOY_MAP(lucxor/peptide.py:15;lucxor/psm.py:19),DECOY_AMINO_ACIDS/MIN_DELTA_SCORE/NEUTRAL_LOSSES(lucxor/psm.py:19),Peak1D(phosphors/phosphors.py:4).Unused local variables (vulture 100%):
min_threads(lucxor/parallel.py:287— a kwargget_optimal_thread_countnever uses),add_ion_types/max_ion_charge(phosphors/phosphors.py:894-895).Unused module constants:
lucxor/constants.pydefines many constants never read anywhere (e.g.ALGORITHM_CID/HCD,DALTONS,PPM_UNITS,PEPXML,TSV, theWRITE_*flags,PEPPROPHET,MASCOTIONSCORE,XCORR,MIN_DELTA_SCORE,ION_TYPES,NEUTRAL_LOSSES,SCORE_TYPES,WATER,PROTON, …). Worth a pass to prune.Verification artifacts
data/1.mzML) — no regressions.Peak.to_dictandglobals.record_flr_estimates.Note:
build()inCIDModel/HCDModeland theto_dict/_build_charge_model/get_charge_model/etc. model methods were investigated as collision suspects and confirmed live (polymorphic dispatch), so they are intentionally not listed above.