Skip to content

Conversation

@forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Oct 21, 2025

Summary

Objectives:

  • Multithread the image tests to reduce runtime.

Issue resolution:

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.
  • Logic: I have visually inspected the entire pull request myself.
  • Pre-commit checks: All the pre-commits checks have passed.

@forsyth2 forsyth2 self-assigned this Oct 21, 2025
@forsyth2 forsyth2 added the Testing Files in `tests` modified label Oct 21, 2025
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 21, 2025

Action items:

  • Make sure separate logs print without mixed-together printing to the command line.
  • Get all 6 tests passing.
  • Produce a complete test_images_summary.md

@forsyth2 forsyth2 force-pushed the multithread-image-tests branch from d848654 to 0e9b692 Compare October 21, 2025 22:06
@forsyth2
Copy link
Collaborator Author

Rebased after merging #751

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now appears to be working.

I was able to run with 1 passed in 946.73s (0:15:46) rather than the previous 1 passed in 3566.36s (0:59:26). So, that's roughly a 4x speedup.

Before merging this though, we should:

  • Reset test/integration/utils.py
  • Do a thorough visual code inspection. Claude was very helpful in drafting this, but we should do a manual inspection of the code it produced.
  • Determine how to best run this (e.g., from a login node or by allocating a compute node)


try:
# Run tests in parallel
# Run tests in parallel using ProcessPoolExecutor for isolated stdout/stderr
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched from multi-threading to multi-processing:

Claude's original note was in favor of multi-threading:

Note from Claude: used ThreadPoolExecutor instead of ProcessPoolExecutor because:

  • Threads share memory, making it easier to work with the shared expansions dict
  • If set_up_and_run_image_checker is I/O bound (file operations, network), threads will be efficient
  • If it's CPU-bound, you could switch to ProcessPoolExecutor for better parallelism

But it turns out that complicates the log-writing (e.g., the log file for bundles might contain log lines for the v2 legacy test).

Now, with multi-processing, Claude notes:

ProcessPoolExecutor creates multiple processes on the same node, not multiple nodes. Each process runs on a separate CPU core of the same machine. This is different from distributed computing across nodes.

Threads (ThreadPoolExecutor): Multiple threads within one process, all sharing the same memory space
Processes (ProcessPoolExecutor): Multiple separate processes on the same machine, each with isolated memory
Multiple nodes: Would require something like mpi4py, dask.distributed, or a job scheduler

So running this on a login node is still fine from a resource perspective (it's just using multiple cores on that one node), though depending on your HPC policies, you might still want to run it on a compute node to avoid overloading the login node.

That is, we may want to consider running the test from a compute node instead.

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visual inspection looks mostly ok; made a couple notes.

I'm thinking we can test this alongside the known working tests during this E3SM Unified testing period. If it doesn't encounter any further problems, it should be good to merge. Since it's just test code and nothing user-facing, it doesn't particularly matter if it's included in the final release or not.

output = captured_output.getvalue()

# Write to individual log file
log_filename = f"test_{test_name}.log"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this logs/... so the logs are in a subdir for easy deletion (rather than rm test_*.log) or review.

construct_markdown_summary_table(ordered_results_dict, "test_images_summary.md")

print("\nTest Summary:")
print(f"{'Test':<50} {'Total':>10} {'Correct':>10} {'Status':>10}")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about using alignment specifiers, for clarity.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Nov 3, 2025

I've tested Perlmutter and Chrysalis on compute nodes.

I don't have recent results from using a dev environment at the moment, but I appear able to activate a dev environment and run the unit tests on a Compute node.

Now that I've tested using these commits on Chrysalis, Perlmutter, and Compy and using compute nodes, I think this should be good to merge.

@forsyth2 forsyth2 marked this pull request as ready for review November 3, 2025 23:09
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Nov 3, 2025

I remembered seeing a deadlock warning on Compy, but I don't recall seeing that on the other machines. In any case, Claude suggested the fixes in 9e2776d for that. I will test on Compy.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Nov 4, 2025

The spawn option works, but was 2.4x slower. Below, from Claude:

Let me break down how 'forkserver' works:

The Three Multiprocessing Methods

'fork' (default on Linux/Unix)

Parent Process (with threads) 
    |
    fork() --> Child Process (inherits threads - DANGEROUS!)
  • Problem: The child gets a copy of the parent's memory, including thread states
  • Threads in the parent were mid-operation, but in the child they're frozen/dead
  • This can cause deadlocks if child tries to use locks those threads held

'spawn' (default on Windows)

Parent Process
    |
    spawn() --> Brand New Python Process
                    |
                    Import everything from scratch
                    Initialize all modules
                    Load your code
                    Finally run your function
  • Safe: Child has no threads because it starts completely fresh
  • Slow: Must re-import all modules, re-initialize everything (your 15 min runtime!)

'forkserver' (hybrid approach)

Step 1 (happens once at startup):
Parent Process --> Start Clean Server Process (no threads yet!)

Step 2 (for each worker):
Clean Server Process
    |
    fork() --> Worker Process (inherits clean state, no threads!)

Key Insight

The trick is timing:

  1. Before your test code runs and creates threads, the forkserver starts a pristine Python process
  2. That server process has all the basic Python machinery but no application threads
  3. When you need workers, they fork from the clean server, not from your thread-heavy parent process

Visual Timeline

Time -->

'fork':
[Parent imports libs, threads start] --> [fork - COPIES THREADS] --> Danger!

'spawn':  
[Parent imports libs, threads start] --> [new process] --> [re-import everything] --> Slow!

'forkserver':
[Start clean server] --> [Parent imports libs, threads start] --> [server forks] --> Fast & Safe!

Why It's Faster Than 'spawn'

The server process can fork (copy-on-write memory) because it has no threads to worry about. So workers get:

  • All the imported modules (numpy, pandas, etc.) via copy-on-write ✓
  • No thread baggage ✓
  • Much faster startup than re-importing everything ✓

Trade-off

'forkserver' has a small one-time startup cost (starting the server), but then each worker creation is much faster than 'spawn' because it uses fork's efficient copy-on-write mechanism.

For your use case with 6 workers, 'forkserver' should be significantly faster than 'spawn' while remaining safe.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Nov 4, 2025

Yes, forkserver gives us comparable speeds to before using spawn, without the deadlock concern. 69a6487 switches to that.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Nov 4, 2025

I've tested the new commits on Compy, so now I think this should be good to merge.

@forsyth2 forsyth2 merged commit e4e1ed2 into main Nov 4, 2025
6 checks passed
@forsyth2 forsyth2 deleted the multithread-image-tests branch November 4, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Testing Files in `tests` modified

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Parallelize image checker

2 participants