Skip to content

Add Windows build support with clang-cl and vcpkg#450

Open
lorisercole wants to merge 54 commits into
mainfrom
feature/ler/windows-build-part-1
Open

Add Windows build support with clang-cl and vcpkg#450
lorisercole wants to merge 54 commits into
mainfrom
feature/ler/windows-build-part-1

Conversation

@lorisercole
Copy link
Copy Markdown
Contributor

@lorisercole lorisercole commented Apr 20, 2026

Summary

First PR in a series to introduce native Windows support for QDK/Chemistry. The project now builds and passes all tests on Windows using clang-cl (MSVC Build Tools) + vcpkg for native dependencies.

Strategy

Compiler: clang-cl — the LLVM Clang frontend with MSVC-compatible ABI. This gives us GCC/Clang C++20 feature support while linking against the MSVC runtime, which is required for Python extension modules on Windows.

Dependencies: vcpkg in manifest mode (vcpkg.json), with a custom OpenBLAS overlay port that builds LAPACK from C sources (no Fortran toolchain needed). By default, dependencies are statically linked using the x64-windows-static-md triplet — all library code is linked directly into _core.pyd, eliminating DLL bundling and the need to call os.add_dll_directory() at import time. A -DynamicDeps switch is available to fall back to shared libraries if needed.

Build scripts: Two PowerShell scripts under .pipelines/pip-scripts/:

  • windows-build-clang-cmake.ps1 — separate C++ build + Python pip install
  • windows-build-clang-pip.ps1 — single-step pip install via scikit-build-core

Both scripts handle the full toolchain setup: VS Build Tools detection/installation, vcvarsall environment, vcpkg bootstrap, and Python pip install.

CMake flags: Per-config compile flags (Debug, Release, RelWithDebInfo) are set explicitly for both MSVC (clang-cl / cl) and GCC/Clang toolchains. Dependencies handle their own warning suppression — no global -Wno-* overrides are applied. Coverage flags (--coverage, -fprofile-arcs) are guarded with AND NOT MSVC since clang-cl uses MSVC-style option parsing despite reporting CMAKE_CXX_COMPILER_ID=Clang.

Testing: vcpkg also provides catch2 and gtest, so C++ tests build on Windows without system-installed test frameworks. A MACIS_ENABLE_TESTS option controls building MACIS tests (ON by default, overridable via -D).

Windows-specific issues discovered and fixed

Data model differences (LLP64 vs LP64)

  • HDF5 size_t serialization: On Windows LLP64, unsigned long is 4 bytes (vs 8 on Linux LP64). NATIVE_ULONG was silently truncating 64-bit values. Fixed by using NATIVE_UINT64 with a static_assert(sizeof(size_t) == 8).
  • HDF5 bool serialization: hbool_t is unsigned int (4 bytes) but C++ bool is 1 byte. Reading bool* with NATIVE_HBOOL was UB. Fixed by using hbool_t intermediaries with static_cast.

DLL loading (Python 3.8+)

  • Python 3.8+ changed DLL search behavior: .pyd dependencies are only found in the .pyd's own directory or via os.add_dll_directory(). PATH is not searched. See this comment.
  • With static linking (default), this is a non-issue — everything is in _core.pyd, like on our Linux/macOS wheels.
  • With dynamic linking, vcpkg DLLs are bundled next to _core.pyd via CMake install(). An explicit list of the 7 required DLLs (openblas, hdf5, fmt, etc.) is installed — OpenMP runtime (libomp) is excluded (locally available from LLVM toolchain, handled by delvewheel for release wheels). The QDK_BUNDLE_RUNTIME_DLLS option (ON by default) auto-detects the vcpkg triplet's VCPKG_LIBRARY_LINKAGE and can be disabled for CI pipelines using delvewheel repair.
  • QDK_DLL_DIR environment variable is kept as an escape hatch for non-vcpkg setups.

Encoding (cp1252 vs UTF-8)

  • Windows defaults to cp1252 encoding. All Python text I/O, subprocess calls, and the license-header pre-commit hook now explicitly use encoding="utf-8". Q# circuit diagrams contain special characters that fail under cp1252.
  • stdout/stderr are reconfigured to UTF-8 at package import time.
  • Subprocess invocations propagate PYTHONIOENCODING=utf-8 to child processes.
  • A dedicated round-trip regression test verifies that Unicode content survives file serialization and stdout capture end-to-end.

File locking

  • Windows does not allow deleting files with open handles (unlike Linux which permits unlinking). Fixed ~24 test instances in Python (using NamedTemporaryFile(delete=False) + manual unlink()) and 2 C++ test cases (scoping std::ifstream in blocks so the destructor closes the handle before std::filesystem::remove()).

spdlog stdout capture

  • On Windows, spdlog::stdout_color_mt() caches a Windows HANDLE at construction. When pytest's capfd redirects fd 1 via dup2(), the cached HANDLE bypasses capture. Added a custom stdout_fd_sink (in the logger module) that writes via fwrite(stdout) through the C runtime's fd layer, giving Windows the same behavior Linux gets by default.

OpenMP

  • GauXC's XC integrator uses element-by-element #pragma omp atomic accumulation on shared matrices. Under LLVM's libomp, this causes non-deterministic floating-point summation order, leading to NaN/divergence in SCF with >2 threads. Does not manifest with GCC/libgomp. Workaround: disable GAUXC_ENABLE_OPENMP on MSVC while keeping OpenMP enabled for the rest of the project (MACIS, our own code). See this GauXC issue.
  • On Windows with clang-cl, CMake resolves OpenMP to LLVM's libomp runtime (-Xclang -fopenmp + libomp.lib). The libomp.dll must be present at runtime from the VS LLVM tools directory.

blaspp/lapackpp on Windows

  • blaspp's BLASFinder.cmake never caches blaspp_defs_ (which holds -DBLAS_FORTRAN_ADD_). On CMake reconfigure, detection is skipped so blaspp_defines becomes empty, causing defines.h to lose all Fortran mangling definitions. Fixed via a git patch that caches blaspp_defs_ at detection time and restores it on reconfigure.
  • lapackpp's MSVC config.h omits #include <complex>, causing compile failures in lobpcgxx. Fixed by force-including <complex> for lapackpp targets on Windows.

C++ code quality fixes (exposed by clang-cl warnings)

  • -Wreorder-ctor: Fixed initializer list order to match declaration order in 5 source files.
  • -Winconsistent-missing-override: Added override specifiers to virtual function redeclarations in DynamicalCorrelationCalculator, StabilityChecker, Macis, and OneBodyIntegralEngine subclasses.
  • -Wdefaulted-function-deleted: Explicitly =delete copy/move assignment in Structure (const data members) and default ctor in ERIMultiplexer (base class has no default ctor).
  • Uninitialized Shell::rpowers: Shell::from_json() used default initialization leaving rpowers indeterminate for non-ECP shells. Fixed by value-initializing (Shell sh{}). Also fixed the BasisSetMap test which had wrong JSON key ("shells" instead of "electron_shells") and used a single-shell element making the test a no-op.

CMake / compiler flags

  • _USE_MATH_DEFINES required for M_PI etc. under MSVC (now handled in dependencies)
  • BLAS_FORTRAN_ADD_ override for Fortran name mangling — no trailing underscore on Windows (now handled in dependencies via patched blaspp)
  • -fPIC skipped on Windows (position-independent code is the default for DLLs)

Misc

  • .gitattributes enforcing LF line endings repo-wide
  • Non-interactive matplotlib backend (Agg) for headless test environments (avoids Tcl/Tk dependency crash on Windows)
  • Skip PySCF-dependent tests when unavailable (PySCF doesn't build on Windows)
  • MACIS uint128_t polyfill: added ==, <<, |= operators for MSVC (which lacks __uint128_t)
  • Pre-commit config excludes .patch files from trailing-whitespace and end-of-file fixers
  • pybind11 discovery fix: query python -m pybind11 --cmakedir and set pybind11_DIR explicitly to avoid vcpkg's find_package wrapper intercepting the search

Considerations for future development

  • Native MSVC cl.exe support: The codebase is ~80% ready. OpenMP 2.0 limitation with MSVC's /openmp (may need /openmp:llvm).
  • CI/CD pipelines: Windows wheel builds should use delvewheel repair (the Windows equivalent of auditwheel) instead of CMake DLL bundling — it resolves the full PE dependency tree automatically.
  • GauXC OpenMP data race: The atomic accumulation issue has been reported upstream. The current workaround (disabling OpenMP for GauXC) loses parallelism in the XC integrator.
  • File encoding: Any new file I/O in Python must use encoding="utf-8" explicitly. Relying on the default locale encoding will silently break on Windows.
  • File locking: Any code that creates a temp file and passes the path to C++ must close the Python handle first. NamedTemporaryFile(delete=False) + manual unlink() is the pattern. Python 3.12+ offers delete_on_close=False as a cleaner alternative.
  • unsigned long is 32-bit on Windows: Never use unsigned long for 64-bit values in portable code. Use uint64_t / size_t instead. This affects HDF5 type mappings in particular.
  • PySCF: Does not build on Windows. Tests that depend on it must be gated with availability checks.

Two issues caused 27 test failures on Windows:

1. NATIVE_ULONG / size_t mismatch: On Windows LLP64, unsigned long is
   4 bytes while size_t is 8 bytes. Use NATIVE_UINT64 instead, which is
   unambiguously 64-bit on all platforms.

2. NATIVE_HBOOL / bool mismatch: HDF5's hbool_t is unsigned int
   (4 bytes), but C++ bool is 1 byte. Reading/writing a bool* with
   NATIVE_HBOOL is undefined behavior. Use hbool_t intermediaries with
   static_cast at each serialization site.
On Windows, std::filesystem::remove() fails if the file still has an
open handle — unlike Linux, which allows unlinking open files.

Two tests opened an std::ifstream to verify FCIDUMP file contents but
called remove() while the stream was still in scope, causing:

  "remove: The process cannot access the file because it is being
   used by another process"

Fix by scoping each ifstream in a block so its destructor closes the
file handle before the remove() call.

Affected tests:
  - HamiltonianTest.SparseContainerFCIDUMP
  - HamiltonianTest.FCIDUMPActiveSpaceConsistency
Shell::from_json() used default initialization (`Shell sh`), leaving
the rpowers array indeterminate for non-ECP shells — the JSON round-
trip never writes rpowers for regular shells. Both BasisHasher and
BasisEqChecker read rpowers unconditionally, so comparing a database-
loaded shell (zero-initialized via `Shell sh{0}`) against a JSON-
deserialized shell was undefined behavior. Fix by value-initializing
(`Shell sh{}`), matching the pattern already used elsewhere.

Also fix BasisSetMap test which had three bugs:
- Reversed basis_json["shells"] but the actual key is "electron_shells"
  (operator[] silently created an empty entry, making the reverse a no-op)
- Used Hydrogen (1 shell in STO-3G), so even reversing the correct key
  would be a no-op; switch to Lithium which has 3 shells
- Test only passed on Linux by accident: uninitialized rpowers garbage
  happened to make the hashes differ
Three issues prevented the Python package from building and importing
on Windows when linking to a pre-installed C++ library:

1. pybind11 not found: vcpkg's find_package wrapper intercepts the
   search and misses pybind11 installed in pip's isolated build
   environment.  Fix by querying `python -m pybind11 --cmakedir` and
   setting pybind11_DIR explicitly before find_package.

2. DLL load failure at import: Python 3.8+ no longer searches PATH for
   DLL dependencies of extension modules.  Add a Windows-only block in
   __init__.py that reads the QDK_DLL_DIR environment variable and
   calls os.add_dll_directory() for each path before _core is imported.

3. Test script updates: use Ninja generator, clang-cl compilers, vcpkg
   toolchain, CMAKE_PREFIX_PATH for the pre-built C++ library, uv for
   venv management, and QDK_DLL_DIR for runtime DLL resolution.
In Windows the default encoding is cp1252, not UTF-8. This caused tests to fail.
NamedTemporaryFile with the default delete=True keeps an exclusive file
handle on Windows, preventing C++ code from opening the same path.

Use delete=False so the handle is released when the with-block exits,
then clean up manually with Path.unlink() in a finally block.  This
pattern is needed wherever a temp file is created in Python and then
passed to the C++ library for reading or writing.

Affected tests: test_basis_set, test_orbitals, test_stability,
test_noise_models (24 instances total).

Note: when the minimum Python version is raised to 3.12+, this can be
simplified using delete_on_close=False, which releases the file handle
on close but auto-deletes on context manager exit — removing the need
for manual unlink() calls.
The check_license_headers.py script prints a ✅ emoji on success, but
Python defaults to cp1252 encoding for stdout on Windows, which cannot
encode it.  Reconfigure stdout to UTF-8 at the top of the script.
On Windows, spdlog::stdout_color_mt() creates a wincolor_stdout_sink
that caches a Windows HANDLE via GetStdHandle(STD_OUTPUT_HANDLE) at
construction time.  All subsequent writes go through WriteFile(HANDLE)
which bypasses the C runtime's file descriptor layer entirely.

When pytest's capfd redirects fd 1 via dup2(), the cached HANDLE still
points to the original stdout, so all Logger output bypasses capture —
causing ~25 test failures across test_logger.py,
test_constants_documentation.py, and test_scf.py.

On Linux this is not a problem because stdout_color_sink_mt is aliased
to ansicolor_stdout_sink, which writes via fwrite(stdout) through the
C runtime's fd layer.

Add a custom stdout_fd_sink for Windows that writes via fwrite(stdout),
giving Windows the same fd-based write path that Linux gets by default.
The sink is used only on Windows (#ifdef _WIN32); Linux/macOS continue
to use stdout_color_mt() unchanged.
Windows uses cp1252 as the default locale encoding, which silently
corrupts non-ASCII text in open(), read_text(), write_text(), and
subprocess.run(text=True) calls.
All text I/O in the package and its test suite now passes
encoding="utf-8" explicitly, and subprocess invocations additionally
propagate PYTHONIOENCODING=utf-8 to child processes so that both
parent and child agree on the codec regardless of the system locale.
A dedicated round-trip regression test verifies that Unicode content
(φ, Å, ΔE, αβγ, 你好) survives file serialization and stdout capture
end-to-end on Windows.
The uv-installed CPython 3.14 on Windows ships with an incomplete
Tcl/Tk installation (missing tcl_findLibrary). Matplotlib defaults
to the TkAgg backend, which crashes when trying to create a Tk
window during circuit diagram plotting tests.

Force the non-interactive Agg backend in conftest.py before any
pyplot import. This is standard practice for CI/headless test
environments and avoids the Tk dependency entirely.
This eliminates CRLF warnings on Windows and ensures consistent line endings across platforms.
Run each example script in a fresh TemporaryDirectory so output
files (e.g. .h5, .json) don't pollute the source tree. On failure
the temp directory is preserved for debugging; on success it is
cleaned up.
Using more than 2 OpenMP threads causes some tests to fail (SCF not converging, ...) due to numerical instabilities.
See: https://gist.github.com/lorisercole/c8081e5f4966c6bf9a2e64734c781fd3
The culprit appears to be GauXC's XC integrator, that uses element-by-element #pragma omp atomic accumulation on shared matrices (inc_by_submat_atomic in util.hpp).
Under LLVM's libomp this causes non-deterministic floating-point summation order, leading to NaN/divergence in SCF with >2 threads.
The issue does not manifest with GCC/libgomp due to its more conservative thread scheduling.
An issue on GauXC repo will be opened.

Temporary fix: disable GAUXC_ENABLE_OPENMP on MSVC while keeping OpenMP enabled for the rest of the project (MACIS, our own code).
@lorisercole lorisercole marked this pull request as ready for review April 20, 2026 20:34
Copilot AI review requested due to automatic review settings April 20, 2026 20:34
Copilot AI review requested due to automatic review settings April 24, 2026 14:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 57 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/CMakeLists.txt Outdated
Comment thread cpp/CMakeLists.txt Outdated
Comment thread cpp/src/qdk/chemistry/data/hdf5_serialization.cpp
Comment thread .pipelines/pip-scripts/windows-build-clang-cmake.ps1 Outdated
Comment thread python/tests/conftest.py
@wavefunction91 wavefunction91 self-assigned this Apr 24, 2026
Copilot AI review requested due to automatic review settings April 29, 2026 13:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 57 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/CMakeLists.txt
Comment thread .pipelines/pip-scripts/windows-build-clang-cmake.ps1 Outdated
Comment thread cpp/cmake/qdk-uarch.cmake Outdated
Comment thread cpp/cmake/third_party.cmake Outdated
Comment thread cpp/cmake/third_party.cmake Outdated
Comment thread cpp/src/qdk/chemistry/algorithms/microsoft/scf/tests/util_tests.cpp
Comment thread cpp/CMakeLists.txt Outdated
Comment thread external/macis/src/lobpcgxx/CMakeLists.txt Outdated
Comment thread python/src/qdk_chemistry/__init__.py
Comment on lines +23 to +35
if _sys.platform == "win32":
# Allow users / CI to point to extra DLL directories via a semicolon-
# separated environment variable (e.g. the vcpkg bin directory).
# Note: DLLs bundled next to _core.pyd are found automatically by the
# Windows loader, so no add_dll_directory() is needed for those.
_dll_dirs = _os.environ.get("QDK_DLL_DIR", "")
_dll_dir_handles = []
for _d in _dll_dirs.split(";"):
_d = _d.strip()
if _d and _os.path.isdir(_d):
_dll_dir_handles.append(_os.add_dll_directory(_d)) # type: ignore[attr-defined]
del _dll_dirs

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idavis Is there a better way to handle this for python libraries on Windows platforms?

Comment thread external/macis/cmake/macis-uarch.cmake Outdated
Copilot AI review requested due to automatic review settings May 4, 2026 14:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 56 out of 57 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/cmake/third_party.cmake
blaspp's BLASFinder.cmake never caches blaspp_defs_ (which holds
-DBLAS_FORTRAN_ADD_). On reconfigure, detection is skipped (run_=false)
so blaspp_defs_ is empty, and the main CMakeLists overwrites
blaspp_defines with empty (CACHE INTERNAL implies FORCE), causing
defines.h to lose all Fortran mangling definitions.

Add a git patch that caches blaspp_defs_ at the end of detection and
restores it in both cached-settings paths.

Also fix lobpcgxx CMakeLists for Windows/clang-cl:
- Force-include <complex> for lapackpp (MSVC config.h omits it)
- Suppress all warnings from blaspp/lapackpp with -w (clang-cl)
We can save build time by not building MACIS tests if not needed (e.g. in CI pipelines).
…exer

Structure has const data members, making assignment impossible; mark
copy/move assignment as =delete to match the immutable design.
ERIMultiplexer's default ctor is implicitly deleted (base class ERI has
no default ctor); mark it =delete explicitly.
Fixes -Winconsistent-missing-override warnings from clang-cl for
pure virtual redeclarations in DynamicalCorrelationCalculator,
StabilityChecker, Macis, and OneBodyIntegralEngine subclasses.
Copilot AI review requested due to automatic review settings May 7, 2026 14:06
@lorisercole lorisercole force-pushed the feature/ler/windows-build-part-1 branch from 2a2760c to e4cdf81 Compare May 7, 2026 14:06
- Replace string(APPEND) with set() for per-config flags; add
  RelWithDebInfo config and mirror C flags from CXX flags.
- Remove the global add_compile_options(-Wno-...) block and the
  per-target re-enable block — dependencies handle their own warnings.
- Remove MSVC-specific workarounds now handled in dependencies
  (_USE_MATH_DEFINES, /FI complex, BLAS_FORTRAN_ADD_,
  -Wno-implicit-function-declaration).
- Update gauxc hash containing clang-cl fixes.
- Enable MACIS_ENABLE_TESTS by default (overridable via -D).
@lorisercole lorisercole force-pushed the feature/ler/windows-build-part-1 branch from e4cdf81 to 0b7b049 Compare May 7, 2026 14:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 68 out of 69 changed files in this pull request and generated 2 comments.

Comment thread python/CMakeLists.txt
Comment on lines 201 to 205
# Install the Python extension module
install(TARGETS _core
LIBRARY DESTINATION .
COMPONENT _core
)

# Ensure venv is activated
if (Test-Path .\venv\Scripts\activate.ps1) {
.\venv\Scripts\activate
@lorisercole
Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants