|
| 1 | +# ElemCo.jl Copilot Instructions |
| 2 | + |
| 3 | +This repository contains **ElemCo.jl** (*elemcoil*), a Julia package for electronic structure calculations and quantum chemistry computations, with a focus on coupled cluster methods and electron correlation techniques. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +ElemCo.jl is a scientific computing package that provides: |
| 8 | +- Coupled cluster methods (CCSD, DCSD, CCSDT, etc.) |
| 9 | +- Density-fitted Hartree-Fock (DF-HF) calculations |
| 10 | +- Post-Hartree-Fock methods including MP2 |
| 11 | +- Quantum chemistry interfaces (Molpro, TREXIO, FCIDUMP) |
| 12 | +- Advanced tensor operations and orbital tools |
| 13 | +- DMRG (Density Matrix Renormalization Group) integration |
| 14 | +- Full Configuration Interaction (FCI) with Selected CI and Heat-Bath CI |
| 15 | + |
| 16 | +## Architecture Overview |
| 17 | + |
| 18 | +### Core Data Flow |
| 19 | +1. **System Definition** → 2. **Integrals** → 3. **SCF** → 4. **CC/Post-HF** → 5. **Properties** |
| 20 | + - `MSystems` (molecular geometry, basis sets) → `Integrals` (FCIDUMP or DF) → `DFHF`/`BOHF` (orbitals) → `CoupledCluster` (amplitudes) → `CCTools` (properties) |
| 21 | + |
| 22 | +### Central State Object: `ECInfo` |
| 23 | +- **Location**: `src/infos/ecinfos.jl` |
| 24 | +- **Purpose**: Global state container for all calculations |
| 25 | +- **Key fields**: |
| 26 | + - `EC.system`: Molecular system (`MSystems.MolecularSystem`) |
| 27 | + - `EC.fd`: FCIDUMP integrals (`FciDumps.FciDump`) |
| 28 | + - `EC.options`: All calculation options (nested structure: `scf`, `cc`, `cholesky`, `wf`, etc.) |
| 29 | + - `EC.space`: Orbital space dictionary (`'o'` = occupied, `'v'` = virtual, etc.) |
| 30 | + - **Usage**: Always passed as first argument: `function_name(EC::ECInfo, ...)` |
| 31 | + |
| 32 | +### Module Organization |
| 33 | +``` |
| 34 | +src/ |
| 35 | +├── ElemCo.jl # Main module, includes all submodules, defines macros |
| 36 | +├── infos/ # ECInfo, Options, ECMethod |
| 37 | +├── system/ # MolecularSystem, BasisSet, Elements |
| 38 | +├── integrals/ # FciDump, DumpTools, DFTools |
| 39 | +├── scf/ # DFHF, BOHF, DFMCSCF, OrbTools, FockFactory |
| 40 | +├── cc/ # CoupledCluster, CCTools, Drivers, DMRG |
| 41 | +├── fci/ # FCI, Davidson, Selected CI, Heat-Bath CI |
| 42 | +├── solvers/ # DIIS, Davidson |
| 43 | +├── tools/ # TensorTools, QMTensors, MIO, Utils |
| 44 | +└── interfaces/ # Molpro, TREXIO, Molden |
| 45 | +``` |
| 46 | + |
| 47 | +## Code Style and Format |
| 48 | + |
| 49 | +### Julia Conventions |
| 50 | + |
| 51 | +**General Style:** |
| 52 | +- Use 2-space indentation consistently |
| 53 | +- Follow Julia standard naming conventions: |
| 54 | + - `snake_case` for functions and variables |
| 55 | + - `PascalCase` for types and modules |
| 56 | + - `UPPER_CASE` for constants |
| 57 | +- Line length: aim for 80-100 characters, but scientific formulas may exceed this |
| 58 | +- Use descriptive variable names, especially for physical quantities |
| 59 | + |
| 60 | +**Function Documentation:** |
| 61 | +- Use Julia docstrings with triple quotes `"""` |
| 62 | +- Include mathematical formulas using LaTeX notation when relevant |
| 63 | +- Document parameters, return values, and provide examples for complex functions |
| 64 | +- Use proper LaTeX formatting for equations, e.g., `[``units``]` for units |
| 65 | + |
| 66 | +**Example:** |
| 67 | +```julia |
| 68 | +""" |
| 69 | + calc_ccsd_energy(EC::ECInfo, T1, T2) |
| 70 | +
|
| 71 | +Calculate the CCSD correlation energy using cluster amplitudes. |
| 72 | +
|
| 73 | +# Arguments |
| 74 | +- `EC::ECInfo`: Electronic structure information object |
| 75 | +- `T1`: Single excitation amplitudes [``T_i^a``] |
| 76 | +- `T2`: Double excitation amplitudes [``T_{ij}^{ab}``] |
| 77 | +
|
| 78 | +# Returns |
| 79 | +- `Float64`: CCSD correlation energy in atomic units |
| 80 | +
|
| 81 | +# Example |
| 82 | +```julia |
| 83 | +T1 = load2idx(EC, "T_vo") |
| 84 | +T2 = load4idx(EC, "T_vvoo") |
| 85 | +E_corr = calc_ccsd_energy(EC, T1, T2) |
| 86 | +``` |
| 87 | +""" |
| 88 | +``` |
| 89 | + |
| 90 | +**Macros:** |
| 91 | +- The package uses domain-specific macros extensively (e.g., `@dfhf`, `@cc`, `@set`) |
| 92 | +- Macro names use lowercase with underscores |
| 93 | +- Reserved variable names: `fcidump`, `geometry`, `basis` |
| 94 | +- Always include `@print_input` at the beginning of input scripts |
| 95 | + |
| 96 | +**Tensor Operations:** |
| 97 | +- Use `@mtensor` macro for tensor contractions (wraps `TensorOperations.@tensor`) |
| 98 | +- Follow Einstein summation notation in comments |
| 99 | +- Include LaTeX expressions for tensor equations in comments |
| 100 | +- Example: `# R_e^m += D_{id}^{el} (\\hat v_{ml}^{di}-\\hat v_{lm}^{di})` |
| 101 | +- Example code: `@mtensor A[p,q,L] = B[p,r,L] * C[r,q]` |
| 102 | +- Use `@mview` for memory-efficient array views (based on `StridedViews`) |
| 103 | + |
| 104 | +### Domain-Specific Language (DSL) |
| 105 | +ElemCo uses **macro-based DSL** for user-facing API (see `src/ElemCo.jl` lines 250-600): |
| 106 | + |
| 107 | +**Key Macros:** |
| 108 | +- `@ECinit` / `@tryECinit` - Initialize `EC::ECInfo` global state |
| 109 | +- `@print_input` - prints input for reproducibility |
| 110 | +- `@dfhf` / `@dfuhf` / `@dfmcscf` - Run SCF calculations, store orbitals in `EC.options.wf.orb` |
| 111 | +- `@cc <method>` - Run CC calculations (automatically calls `@dfints` if needed) |
| 112 | +- `@dfcc <method>` - Run CC with on-the-fly density fitting |
| 113 | +- `@set <opt> <key>=<val>` - Set options (e.g., `@set scf thr=1.e-14 maxit=100`) |
| 114 | +- `@fci` - Run FCI calculation |
| 115 | + |
| 116 | +**Reserved Variables:** |
| 117 | +- `fcidump::String` - Path to FCIDUMP file |
| 118 | +- `geometry::String` - Molecular geometry (Cartesian or Z-matrix) |
| 119 | +- `basis::String` or `Dict` - Basis set specification |
| 120 | +- `EC::ECInfo` - Global state object (auto-created by macros) |
| 121 | + |
| 122 | +**Example Input Pattern:** |
| 123 | +```julia |
| 124 | +using ElemCo |
| 125 | +@print_input # Always first! |
| 126 | + |
| 127 | +geometry = "O 0 0 0; H 0 0 1.8; H 0 1.8 0" |
| 128 | +basis = "cc-pVDZ" |
| 129 | +@dfhf # Run HF, stores orbitals |
| 130 | +@cc dcsd # Run DCSD using stored orbitals |
| 131 | +``` |
| 132 | + |
| 133 | +### Module Structure |
| 134 | + |
| 135 | +**File Organization:** |
| 136 | +- Main module: `src/ElemCo.jl` |
| 137 | +- Submodules organized by functionality: |
| 138 | + - `cc/` - Coupled cluster methods (see `drivers.jl` for entry points) |
| 139 | + - `scf/` - Self-consistent field methods |
| 140 | + - `integrals/` - Integral handling and transformations |
| 141 | + - `system/` - Molecular systems and basis sets |
| 142 | + - `tools/` - Utilities and tensor operations |
| 143 | + - `interfaces/` - External program interfaces (Molpro, TREXIO) |
| 144 | + - `fci/` - Full CI implementation |
| 145 | + |
| 146 | +**Constants and Physical Units:** |
| 147 | +- Define physical constants in `Constants` module |
| 148 | +- Include proper units in docstrings: `[``m~s^{-1}``]` |
| 149 | +- Use atomic units as the default unit system |
| 150 | + |
| 151 | +## Development Guidelines |
| 152 | + |
| 153 | +### Testing |
| 154 | +- Tests are located in `test/` directory |
| 155 | +- Use descriptive test names that indicate the method being tested |
| 156 | +- Test files follow pattern: `method_system.jl` (e.g., `h2o_dcsd.jl`) |
| 157 | +- Tests use `@testset` with energy comparisons and numerical thresholds |
| 158 | +- Standard test pattern: |
| 159 | +```julia |
| 160 | +@testset "System Method Test" begin |
| 161 | + epsilon = 1.e-6 |
| 162 | + E_ref = -75.6457645933 # Reference energy |
| 163 | + |
| 164 | + @print_input |
| 165 | + fcidump = joinpath(@__DIR__, "files", "system.FCIDUMP") |
| 166 | + energies = @cc method |
| 167 | + |
| 168 | + @test abs(energies["METHOD"] - E_ref) < epsilon |
| 169 | +end |
| 170 | +``` |
| 171 | +- Run tests with: `julia --project=. test/runtests.jl` |
| 172 | +- Quick tests available via: `julia --project=. test/runtests.jl quick` |
| 173 | + |
| 174 | +### Dependencies |
| 175 | +- Minimize external dependencies |
| 176 | +- Use `LinearAlgebra`, `TensorOperations` for mathematical operations |
| 177 | +- HDF5 for data storage, XML for configuration files |
| 178 | +- `libcint_jll` for integral calculations |
| 179 | + |
| 180 | +### Performance Considerations |
| 181 | +- **Type stability is essential**: All performance-critical functions must be type-stable |
| 182 | + - Ensure return types are inferrable from input types at compile time |
| 183 | + - Use `@code_warntype` to check for type instabilities |
| 184 | + - Avoid abstract types in struct fields (use parametric types or concrete types) |
| 185 | + - Use `Val{N}` for dimension-dependent code (see `mioload` in `src/tools/myio.jl`) |
| 186 | + - Example: Return `Array{Float64,N}` not `Array` from functions |
| 187 | +- Use in-place operations where possible (functions ending with `!`) |
| 188 | +- Leverage BLAS operations via `LinearAlgebra` |
| 189 | +- Memory management is crucial for large tensor operations |
| 190 | +- Use `load4idx()` (`load3idx`, `load2idx`, etc) and `save4idx()` (`save3idx()`, `save2idx()`, etc) for tensor disk I/O |
| 191 | + |
| 192 | +### Type Stability Checking with JET |
| 193 | + |
| 194 | +Use **JET.jl** for comprehensive type stability analysis. The analysis script is in `profile/jet.jl`. |
| 195 | + |
| 196 | +**Running JET Analysis:** |
| 197 | +```bash |
| 198 | +julia --project=. profile/jet.jl |
| 199 | +``` |
| 200 | + |
| 201 | +**How it works:** |
| 202 | +- Uses `@report_opt` to analyze optimization issues and runtime dispatches |
| 203 | +- Targets all ElemCo modules to catch type instabilities across the codebase |
| 204 | +- Reports "possible errors" which are typically runtime dispatches due to type instability |
| 205 | + |
| 206 | +**Fixing Type Instabilities - Key Principles:** |
| 207 | + |
| 208 | +1. **Minimize type annotations**: Do NOT add return type annotations as a first solution |
| 209 | +2. **Find and fix the root cause**: Trace the instability back to its origin |
| 210 | +3. **Common root causes:** |
| 211 | + - Functions returning abstract types (e.g., `Matrix{T} where T` instead of `Matrix{Float64}`) |
| 212 | + - Closures with `f::Function` abstract type preventing inference |
| 213 | + - Type-unstable data flowing through multiple function calls |
| 214 | + - Reading data from files/interfaces without concrete type conversion |
| 215 | + |
| 216 | +4. **Fixing strategies (in order of preference):** |
| 217 | + - Fix the source function to return concrete types |
| 218 | + - Add explicit type conversion at data boundaries (e.g., `Matrix{Float64}(data)`) |
| 219 | + - Use concrete types in struct fields |
| 220 | + - Only as last resort: add return type annotations |
| 221 | + |
| 222 | +5. **Known acceptable instabilities:** |
| 223 | + - `kwcall` runtime dispatch (inherent Julia limitation with keyword arguments) |
| 224 | + - Dynamic dispatch in initialization code (not performance-critical) |
| 225 | + |
| 226 | +**Example - Fixing at the source:** |
| 227 | +```julia |
| 228 | +# BAD: Adding annotation to hide the problem |
| 229 | +function process_data(data)::Matrix{Float64} |
| 230 | + return compute(data) # compute() returns abstract type |
| 231 | +end |
| 232 | + |
| 233 | +# GOOD: Fix compute() to return concrete type |
| 234 | +function compute(data) |
| 235 | + result = some_operation(data) |
| 236 | + return Matrix{Float64}(result) # Convert at the source |
| 237 | +end |
| 238 | + |
| 239 | +function process_data(data) |
| 240 | + return compute(data) # Now type-stable without annotation |
| 241 | +end |
| 242 | +``` |
| 243 | + |
| 244 | +**After making changes:** |
| 245 | +- Re-run `profile/jet.jl` to verify improvements |
| 246 | +- Run test suite to ensure correctness: `julia --project=. test/runtests.jl` |
| 247 | + |
| 248 | +## Quantum Chemistry Specifics |
| 249 | + |
| 250 | +### Mathematical Notation |
| 251 | +- Use standard quantum chemistry notation |
| 252 | +- Greek letters for spin indices (α, β) |
| 253 | +- Latin letters for spatial orbitals (i,j,k... occupied, a,b,c... virtual) |
| 254 | +- Tensor indices follow physicist's notation |
| 255 | + |
| 256 | +### Method Implementations |
| 257 | +- Coupled cluster amplitudes: T1 (singles), T2 (doubles), T3 (triples) |
| 258 | +- Density matrices: 1RDM, 2RDM with proper symmetry |
| 259 | +- Fock matrices with density fitting approximations |
| 260 | +- Molecular orbital coefficients and transformations |
| 261 | + |
| 262 | +### Input File Format |
| 263 | +Standard input files should start with: |
| 264 | +```julia |
| 265 | +using ElemCo |
| 266 | +@print_input |
| 267 | + |
| 268 | +# Option 1: Using FCIDUMP file |
| 269 | +fcidump = "path/to/file.FCIDUMP" |
| 270 | +@cc dcsd |
| 271 | + |
| 272 | +# Option 2: Define molecular system |
| 273 | +geometry = "H 0.0 0.0 0.0 |
| 274 | + H 0.0 0.0 1.0" |
| 275 | +basis = "cc-pVDZ" |
| 276 | +@dfhf |
| 277 | +@cc dcsd |
| 278 | + |
| 279 | +# Option 3: Using ccdriver function |
| 280 | +EC = ECInfo() |
| 281 | +energies = ElemCo.ccdriver(EC, "ccsd(t)"; fcidump="file.FCIDUMP") |
| 282 | +``` |
| 283 | + |
| 284 | +**Key Input Patterns:** |
| 285 | +- Always include `@print_input` for reproducibility |
| 286 | +- Use `fcidump`, `geometry`, `basis` as reserved variable names |
| 287 | +- Methods: `dcsd`, `ccsd`, `ccsd(t)`, `λccsd(t)`, `mp2`, etc. |
| 288 | +- Options set via `@set` macro: `@set scf maxit=50` |
| 289 | +- Occupation can be specified: `@cc dcsd occa="1-5" occb="1-4"` |
| 290 | + |
| 291 | +## Common Patterns |
| 292 | + |
| 293 | +### Error Handling |
| 294 | +- Use Julia's exception system |
| 295 | +- Provide meaningful error messages with context |
| 296 | +- Include suggestions for fixing common user errors |
| 297 | + |
| 298 | +### Logging and Output |
| 299 | +- Use `println()` for user-facing output |
| 300 | +- Include timing information for expensive operations |
| 301 | +- Progress reporting for iterative methods |
| 302 | +- ASCII art headers for major calculation sections |
| 303 | + |
| 304 | +### Memory Management |
| 305 | +- Use `NOTHING4idx` (`NOTHING3idx`, `NOTHING2idx`, etc) constant for clearing large tensors |
| 306 | +- Implement scratch directory management (default: system temp dir + "elemcojlscr") |
| 307 | +- Handle temporary files appropriately |
| 308 | +- Memory-mapped I/O for large tensors via `MIO` module (`miosave`, `mioload`, `miommap`) |
| 309 | + |
| 310 | +## Contributing Guidelines |
| 311 | + |
| 312 | +### Code Reviews |
| 313 | +- Ensure mathematical correctness of implementations |
| 314 | +- Verify numerical stability and convergence |
| 315 | +- Check performance implications of changes |
| 316 | +- Validate against reference implementations when available |
| 317 | + |
| 318 | +### Documentation |
| 319 | +- Update docstrings for any API changes |
| 320 | +- Include examples in documentation |
| 321 | +- Mathematical derivations should be clear and complete |
| 322 | +- Reference papers and methods appropriately |
| 323 | + |
| 324 | +### Backward Compatibility |
| 325 | +- Maintain compatibility with existing input files |
| 326 | +- Deprecate features gracefully with warnings |
| 327 | +- Preserve numerical results for regression testing |
| 328 | + |
| 329 | +Remember: This is scientific software where correctness and numerical stability are paramount. Always validate implementations against established quantum chemistry references and test with multiple molecular systems. |
0 commit comments