Skip to content

Fastcsp updates performance updates#1971

Open
lbluque wants to merge 5 commits intofastcsp_updatesfrom
fastcsp_updates_perf
Open

Fastcsp updates performance updates#1971
lbluque wants to merge 5 commits intofastcsp_updatesfrom
fastcsp_updates_perf

Conversation

@lbluque
Copy link
Copy Markdown
Contributor

@lbluque lbluque commented Apr 18, 2026

Summary

Performance and parallelism improvements for the FastCSP workflow.

  • Per-Z parallelism in filtering: filter.py now submits one SLURM job per (molecule, Z-value) pair instead of one per molecule. Since structures with different Z values can't match during deduplication, this is semantically correct and dramatically increases parallelism for molecules with many structures. Per-Z results are concatenated afterward.

  • Per-Z parallelism in Genarris processing: process_generated.py restructures job submission to iterate per Z-value directory (mol/conf/z) rather than per conformer. Structure-to-row conversion is now parallelized via p_map instead of a sequential loop. CIF-to-structure conversion in filtering also switched from serial .apply() to p_map.

  • Proper CPU propagation: num_cpus is now threaded through from SLURM config to swifter, p_map, and connectivity validation calls (previously hardcoded to 70 or 1). Eval SLURM defaults bumped from 1 CPU / 10 GB to 16 CPUs / 64 GB.

  • Minor: Fixed parquet file discovery filter logic in eval.py (suffix check before name check), formatting cleanups.

Performance Benchmark Report

Environment

  • Machine: 96-core CPU (no GPU)
  • Iterations per config: 1
  • Data: ACETAC + GLYCIN from /home/lbluque/test-free-energy
  • Modes: Sequential (96 cores, 1 job at a time) vs Parallel (6 nodes × 16 cores)
  • Branches: fastcsp_updates (baseline) vs fastcsp_updates_perf (optimized)

Total End-to-End Improvement

baseline-sequential vs perf-parallel (full improvement)

Stage baseline-seq (s) perf-par (s) Speedup
process_generated 1228.4 1039.9 1.18x faster
filter 627.3 41.4 15.14x faster
evaluate 635.0 1196.2 1.88x slower
total 2490.8 2277.5 1.09x faster

Notes

  • Sequential mode: All SLURM jobs run one-at-a-time in the main process (96 cores available)
  • Parallel mode: Simulates 6 SLURM nodes with 16 cores each using multiprocessing.Pool with CPU affinity pinning
  • The perf branch splits work at Z-value granularity (more smaller jobs) vs baseline per-molecule/conformer (fewer larger jobs)
  • Timings measure wall-clock computation time (no real SLURM overhead)

@meta-cla meta-cla Bot added the cla signed label Apr 18, 2026
@lbluque lbluque requested a review from gvahe April 18, 2026 00:32
@lbluque lbluque added refactor patch Patch version release labels Apr 18, 2026
angle_tol: float = 5,
remove_duplicates: bool = False,
root_unrelaxed: Path | None = None,
num_cpus: int = 70,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not hard-code 70 here. Better to take that from the config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed patch Patch version release refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant