Fastcsp updates performance updates by lbluque · Pull Request #1971 · facebookresearch/fairchem

lbluque · 2026-04-18T00:31:50Z

Summary

Performance and parallelism improvements for the FastCSP workflow.

Per-Z parallelism in filtering: filter.py now submits one SLURM job per (molecule, Z-value) pair instead of one per molecule. Since structures with different Z values can't match during deduplication, this is semantically correct and dramatically increases parallelism for molecules with many structures. Per-Z results are concatenated afterward.
Per-Z parallelism in Genarris processing: process_generated.py restructures job submission to iterate per Z-value directory (mol/conf/z) rather than per conformer. Structure-to-row conversion is now parallelized via p_map instead of a sequential loop. CIF-to-structure conversion in filtering also switched from serial .apply() to p_map.
Proper CPU propagation: num_cpus is now threaded through from SLURM config to swifter, p_map, and connectivity validation calls (previously hardcoded to 70 or 1). Eval SLURM defaults bumped from 1 CPU / 10 GB to 16 CPUs / 64 GB.
Minor: Fixed parquet file discovery filter logic in eval.py (suffix check before name check), formatting cleanups.

Performance Benchmark Report

Environment

Machine: 96-core CPU (no GPU)
Iterations per config: 1
Data: ACETAC + GLYCIN from /home/lbluque/test-free-energy
Modes: Sequential (96 cores, 1 job at a time) vs Parallel (6 nodes × 16 cores)
Branches: fastcsp_updates (baseline) vs fastcsp_updates_perf (optimized)

Total End-to-End Improvement

baseline-sequential vs perf-parallel (full improvement)

Stage	baseline-seq (s)	perf-par (s)	Speedup
process_generated	1228.4	1039.9	1.18x faster
filter	627.3	41.4	15.14x faster
evaluate	635.0	1196.2	1.88x slower
total	2490.8	2277.5	1.09x faster

Notes

Sequential mode: All SLURM jobs run one-at-a-time in the main process (96 cores available)
Parallel mode: Simulates 6 SLURM nodes with 16 cores each using multiprocessing.Pool with CPU affinity pinning
The perf branch splits work at Z-value granularity (more smaller jobs) vs baseline per-molecule/conformer (fewer larger jobs)
Timings measure wall-clock computation time (no real SLURM overhead)

lbluque · 2026-04-18T00:35:01Z

    angle_tol: float = 5,
    remove_duplicates: bool = False,
    root_unrelaxed: Path | None = None,
+    num_cpus: int = 70,


We should not hard-code 70 here. Better to take that from the config

lbluque added 5 commits April 15, 2026 15:10

use num_cpus for swifter structure matcher

c6a6537

pass ncpus in filter

e9ae2dc

process genarris outputs by z

3e2fe00

filter jobs by z val

805e3ef

fix passing num_cpus to pmg

9982599

meta-cla Bot added the cla signed label Apr 18, 2026

lbluque requested a review from gvahe April 18, 2026 00:32

lbluque added refactor patch Patch version release labels Apr 18, 2026

lbluque commented Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fastcsp updates performance updates#1971

Fastcsp updates performance updates#1971
lbluque wants to merge 5 commits intofastcsp_updatesfrom
fastcsp_updates_perf

lbluque commented Apr 18, 2026 •

edited

Loading

Uh oh!

lbluque Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lbluque commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Benchmark Report

Environment

Total End-to-End Improvement

Notes

Uh oh!

lbluque Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lbluque commented Apr 18, 2026 •

edited

Loading