Multithread image tests #752

forsyth2 · 2025-10-21T19:09:13Z

Summary

Objectives:

Multithread the image tests to reduce runtime.

Issue resolution:

Closes [Feature]: Parallelize image checker #724

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Small Change

To merge, I will use "Squash and merge". That is, this change should be a single commit.
Logic: I have visually inspected the entire pull request myself.
Pre-commit checks: All the pre-commits checks have passed.

forsyth2 · 2025-10-21T19:10:44Z

Action items:

Make sure separate logs print without mixed-together printing to the command line.
Get all 6 tests passing.
Produce a complete test_images_summary.md

forsyth2 · 2025-10-21T22:06:40Z

Rebased after merging #751

forsyth2

The code now appears to be working.

I was able to run with 1 passed in 946.73s (0:15:46) rather than the previous 1 passed in 3566.36s (0:59:26). So, that's roughly a 4x speedup.

Before merging this though, we should:

Reset test/integration/utils.py
Do a thorough visual code inspection. Claude was very helpful in drafting this, but we should do a manual inspection of the code it produced.
Determine how to best run this (e.g., from a login node or by allocating a compute node)

forsyth2 · 2025-10-21T23:55:11Z

tests/integration/test_images.py

+
    try:
-        # Run tests in parallel
+        # Run tests in parallel using ProcessPoolExecutor for isolated stdout/stderr


Switched from multi-threading to multi-processing:

Claude's original note was in favor of multi-threading:

Note from Claude: used ThreadPoolExecutor instead of ProcessPoolExecutor because:

Threads share memory, making it easier to work with the shared expansions dict

If set_up_and_run_image_checker is I/O bound (file operations, network), threads will be efficient

If it's CPU-bound, you could switch to ProcessPoolExecutor for better parallelism

But it turns out that complicates the log-writing (e.g., the log file for bundles might contain log lines for the v2 legacy test).

Now, with multi-processing, Claude notes:

ProcessPoolExecutor creates multiple processes on the same node, not multiple nodes. Each process runs on a separate CPU core of the same machine. This is different from distributed computing across nodes.

Threads (ThreadPoolExecutor): Multiple threads within one process, all sharing the same memory space
Processes (ProcessPoolExecutor): Multiple separate processes on the same machine, each with isolated memory
Multiple nodes: Would require something like mpi4py, dask.distributed, or a job scheduler

So running this on a login node is still fine from a resource perspective (it's just using multiple cores on that one node), though depending on your HPC policies, you might still want to run it on a compute node to avoid overloading the login node.

That is, we may want to consider running the test from a compute node instead.

forsyth2

Visual inspection looks mostly ok; made a couple notes.

I'm thinking we can test this alongside the known working tests during this E3SM Unified testing period. If it doesn't encounter any further problems, it should be good to merge. Since it's just test code and nothing user-facing, it doesn't particularly matter if it's included in the final release or not.

forsyth2 · 2025-10-22T17:50:21Z

tests/integration/test_images.py

+        output = captured_output.getvalue()
+
+        # Write to individual log file
+        log_filename = f"test_{test_name}.log"


Maybe make this logs/... so the logs are in a subdir for easy deletion (rather than rm test_*.log) or review.

forsyth2 · 2025-10-22T17:53:04Z

tests/integration/test_images.py

+    construct_markdown_summary_table(ordered_results_dict, "test_images_summary.md")
+
+    print("\nTest Summary:")
+    print(f"{'Test':<50} {'Total':>10} {'Correct':>10} {'Status':>10}")


Add a comment about using alignment specifiers, for clarity.

forsyth2 · 2025-11-03T23:08:48Z

I've tested Perlmutter and Chrysalis on compute nodes.

I don't have recent results from using a dev environment at the moment, but I appear able to activate a dev environment and run the unit tests on a Compute node.

Now that I've tested using these commits on Chrysalis, Perlmutter, and Compy and using compute nodes, I think this should be good to merge.

forsyth2 · 2025-11-03T23:50:02Z

I remembered seeing a deadlock warning on Compy, but I don't recall seeing that on the other machines. In any case, Claude suggested the fixes in 9e2776d for that. I will test on Compy.

forsyth2 · 2025-11-04T01:14:01Z

The spawn option works, but was 2.4x slower. Below, from Claude:

Let me break down how 'forkserver' works:

The Three Multiprocessing Methods

'fork' (default on Linux/Unix)

Parent Process (with threads) 
    |
    fork() --> Child Process (inherits threads - DANGEROUS!)

Problem: The child gets a copy of the parent's memory, including thread states
Threads in the parent were mid-operation, but in the child they're frozen/dead
This can cause deadlocks if child tries to use locks those threads held

'spawn' (default on Windows)

Parent Process
    |
    spawn() --> Brand New Python Process
                    |
                    Import everything from scratch
                    Initialize all modules
                    Load your code
                    Finally run your function

Safe: Child has no threads because it starts completely fresh
Slow: Must re-import all modules, re-initialize everything (your 15 min runtime!)

'forkserver' (hybrid approach)

Step 1 (happens once at startup):
Parent Process --> Start Clean Server Process (no threads yet!)

Step 2 (for each worker):
Clean Server Process
    |
    fork() --> Worker Process (inherits clean state, no threads!)

Key Insight

The trick is timing:

Before your test code runs and creates threads, the forkserver starts a pristine Python process
That server process has all the basic Python machinery but no application threads
When you need workers, they fork from the clean server, not from your thread-heavy parent process

Visual Timeline

Time -->

'fork':
[Parent imports libs, threads start] --> [fork - COPIES THREADS] --> Danger!

'spawn':  
[Parent imports libs, threads start] --> [new process] --> [re-import everything] --> Slow!

'forkserver':
[Start clean server] --> [Parent imports libs, threads start] --> [server forks] --> Fast & Safe!

Why It's Faster Than 'spawn'

The server process can fork (copy-on-write memory) because it has no threads to worry about. So workers get:

All the imported modules (numpy, pandas, etc.) via copy-on-write ✓
No thread baggage ✓
Much faster startup than re-importing everything ✓

Trade-off

'forkserver' has a small one-time startup cost (starting the server), but then each worker creation is much faster than 'spawn' because it uses fork's efficient copy-on-write mechanism.

For your use case with 6 workers, 'forkserver' should be significantly faster than 'spawn' while remaining safe.

forsyth2 · 2025-11-04T01:33:41Z

Yes, forkserver gives us comparable speeds to before using spawn, without the deadlock concern. 69a6487 switches to that.

forsyth2 · 2025-11-04T01:34:34Z

I've tested the new commits on Compy, so now I think this should be good to merge.

forsyth2 self-assigned this Oct 21, 2025

forsyth2 added the Testing Files in `tests` modified label Oct 21, 2025

Attempt to multithread the image tests

0e9b692

forsyth2 force-pushed the multithread-image-tests branch from d848654 to 0e9b692 Compare October 21, 2025 22:06

Working multiprocessing

a81a7c7

forsyth2 commented Oct 21, 2025

View reviewed changes

Reset test utils

b8881c7

forsyth2 commented Oct 22, 2025

View reviewed changes

forsyth2 marked this pull request as ready for review November 3, 2025 23:09

forsyth2 added 2 commits November 3, 2025 17:34

Address comments

76bac29

Prevent deadlocks by using spawn

9e2776d

Use forkserver for better performance

69a6487

forsyth2 merged commit e4e1ed2 into main Nov 4, 2025
6 checks passed

forsyth2 deleted the multithread-image-tests branch November 4, 2025 01:36

Multithread image tests #752

Multithread image tests #752

Uh oh!

Conversation

forsyth2 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Small Change

Uh oh!

forsyth2 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forsyth2 commented Oct 21, 2025

Uh oh!

forsyth2 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

forsyth2 Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

forsyth2 left a comment

Choose a reason for hiding this comment

Uh oh!

forsyth2 Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

forsyth2 Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

forsyth2 commented Nov 3, 2025

Uh oh!

forsyth2 commented Nov 3, 2025

Uh oh!

forsyth2 commented Nov 4, 2025

The Three Multiprocessing Methods

'fork' (default on Linux/Unix)

'spawn' (default on Windows)

'forkserver' (hybrid approach)

Key Insight

Visual Timeline

Why It's Faster Than 'spawn'

Trade-off

Uh oh!

forsyth2 commented Nov 4, 2025

Uh oh!

forsyth2 commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

forsyth2 commented Oct 21, 2025 •

edited

Loading

forsyth2 commented Oct 21, 2025 •

edited

Loading

forsyth2 left a comment •

edited

Loading