Skip to content

Rueping catalyst conformational sampling and GSM workflow #88

@joshkamm

Description

@joshkamm

Summary

This issue tracks the work done on the alex/adaptation branch to implement a complete conformational sampling and GSM (Growing String Method) workflow for the Rueping catalyst system, specifically for studying the cationic 2-aza-Cope rearrangement reaction.

Development Timeline

Challenge: Maintaining the Hydrogen Bond

The central challenge in this work was maintaining the critical hydrogen bond between the catalyst and substrate during conformer generation.

Attempt 1: Direct Conformer Generation

  • Initially tried generating conformers of the full catalyst-substrate complex using OpenBabel
  • Problem: OpenBabel didn't recognize the hydrogen bond between the substrate and catalyst, causing the H-bond to frequently break during conformer generation
  • This produced geometries that were chemically unreasonable for the reaction mechanism

Attempt 2: Combinatorial Assembly

  • Tried generating conformers for the catalyst and substrate fragments separately (rueping_combinatorial.py)
  • Used SMARTS patterns to identify H-bond donor (O=44, H=77) and acceptor (N=12) atoms
  • Problem: Ran into issues when trying to combine the fragment conformers back together while maintaining proper H-bond geometry

Solution: Boron Bridging Strategy

  • Overcame the H-bond maintenance problem by temporarily replacing the H-bond with a covalent O-B-N bridge
  • This forces the conformer generator to maintain the connection throughout sampling
  • After conformer generation, replace B with H to restore the hydrogen bond
  • Key insight: Skipping force field optimization and going directly to xTB preserves better H-bond geometry:
    • With MMFF force field: 166° → 143° → 154° (damaged then partially recovered)
    • With direct xTB: maintains ~166° H-bond geometry

Work Completed

1. DevContainer and Development Environment Setup

  • Added Dockerfile and updated devcontainer.json for pixi-based development (configure dev containers #66)
  • Added Claude Code as a devcontainer feature for AI-assisted development
  • Added X11 dependencies (xorg-libxrender, xorg-libxext) to fix OpenBabel plugin loading in minimal Linux environments

2. Rueping Example Implementation (examples/rueping/)

Fragment Preparation

  • Added RUEPING-CATALYST.xyz (78 atoms) and RUEPING-SUBSTRATE.xyz (51 atoms) as separate fragment files
  • Added full_system.xyz with the complete catalyst-substrate complex
  • Implemented SMARTS-based identification of H-bond donor and acceptor atoms

Conformer Generation Workflow (rueping.py, rueping_test_100.py)

Final working workflow using the boron bridge strategy:

  1. Generate conformers with B bridge (preserves H-bond connection)
  2. Replace B with H after conformer generation
  3. Remove duplicate conformers based on RMSD
  4. Optimize directly with xTB (skip force field - it damages H-bond geometry)
  5. Compute final energies

GSM Pathway Calculations

  • rueping_gsm.py: GSM script for cationic 2-aza-Cope rearrangement
  • analyze_gsm_results.py: Analysis script that extracts energy profiles, calculates barrier heights and reaction energies, and identifies lowest barrier pathway

3. HPC Integration (SLURM)

  • run_conformers.slurm: 100 conformer generation job (12h, 64 CPUs)
  • run_gsm.slurm: GSM pathway calculations (24h, 64 CPUs)
  • submit_all.sh: Automated submission script with dependency handling
    • Handles pixi install on head node automatically
    • Verifies Python and xTB availability before submission
    • Provides detailed submission summary with log file paths
  • Added comprehensive README with workflow documentation and expected runtimes

4. Bug Fixes and Performance Improvements

xTB Parallelization Fix

  • Set OMP_NUM_THREADS=1 before imports to prevent thread oversubscription
  • xTB was using multiple OpenMP threads per conformer, causing ~10x slowdown when parallelizing with ProcessPoolExecutor

Scaling Bug Fixes (3 → 100 conformers)

  • replace_boron_with_hydrogen: Manually remove B-N bond instead of XYZ roundtrip reperception
  • get_unique_conformer_ids: Use RDKit AlignMol with explicit identity atom map instead of CalcRMS substructure matching
  • Added compute_rmsd helper function for coordinate-based RMSD with proper rigid-body alignment
  • Fixed RuntimeError: No sub-structure match found when scaling to 100 conformers

pixi Environment Fixes

  • Automatically detect and handle missing sha errors
  • Regenerate lock file with --locked=false if needed

5. GSM Module Updates (src/conformational_sampling/gsm.py)

  • Uncommented TS optimization in stk_de_gsm function
  • Minor refinements for the Rueping workflow

Files Changed

  • 20 files changed, +2,290 lines, -135 lines
  • New example directory: examples/rueping/ with complete workflow
  • DevContainer configuration updates
  • README documentation updates
  • GSM module refinements

Testing

  • Successfully tested with 3 conformers (fast validation)
  • Configured for 100 conformer HPC runs
  • RMSD threshold documentation clarified (smaller = more conformers kept)

Potential Next Steps

1. String and Transition State Filtering

Add filtering for the GSM strings and transition states based on:

  • Energy profile analysis (barrier heights, reaction energies)
  • Bonding changes during the reaction (verify correct bonds are forming/breaking)
  • Similar to the filtering approach in the original workflow

2. Improved Potential Energy Surface

Explore options for improving the underlying PES accuracy:

  • g-xTB: Once the implementation of this new method is released, it may offer improved accuracy while maintaining computational efficiency
  • Meta's universal models of atoms: Recently released ML potentials that could provide better accuracy than xTB
  • DFT methods: More accurate but computationally expensive; may be feasible for final refinement of top candidates

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions