-
Notifications
You must be signed in to change notification settings - Fork 14
Multithread image tests #752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Action items:
|
d848654 to
0e9b692
Compare
|
Rebased after merging #751 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now appears to be working.
I was able to run with 1 passed in 946.73s (0:15:46) rather than the previous 1 passed in 3566.36s (0:59:26). So, that's roughly a 4x speedup.
Before merging this though, we should:
- Reset
test/integration/utils.py - Do a thorough visual code inspection. Claude was very helpful in drafting this, but we should do a manual inspection of the code it produced.
- Determine how to best run this (e.g., from a login node or by allocating a compute node)
|
|
||
| try: | ||
| # Run tests in parallel | ||
| # Run tests in parallel using ProcessPoolExecutor for isolated stdout/stderr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched from multi-threading to multi-processing:
Claude's original note was in favor of multi-threading:
Note from Claude: used ThreadPoolExecutor instead of ProcessPoolExecutor because:
- Threads share memory, making it easier to work with the shared expansions dict
- If set_up_and_run_image_checker is I/O bound (file operations, network), threads will be efficient
- If it's CPU-bound, you could switch to ProcessPoolExecutor for better parallelism
But it turns out that complicates the log-writing (e.g., the log file for bundles might contain log lines for the v2 legacy test).
Now, with multi-processing, Claude notes:
ProcessPoolExecutor creates multiple processes on the same node, not multiple nodes. Each process runs on a separate CPU core of the same machine. This is different from distributed computing across nodes.
Threads (ThreadPoolExecutor): Multiple threads within one process, all sharing the same memory space
Processes (ProcessPoolExecutor): Multiple separate processes on the same machine, each with isolated memory
Multiple nodes: Would require something like mpi4py, dask.distributed, or a job schedulerSo running this on a login node is still fine from a resource perspective (it's just using multiple cores on that one node), though depending on your HPC policies, you might still want to run it on a compute node to avoid overloading the login node.
That is, we may want to consider running the test from a compute node instead.
forsyth2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Visual inspection looks mostly ok; made a couple notes.
I'm thinking we can test this alongside the known working tests during this E3SM Unified testing period. If it doesn't encounter any further problems, it should be good to merge. Since it's just test code and nothing user-facing, it doesn't particularly matter if it's included in the final release or not.
tests/integration/test_images.py
Outdated
| output = captured_output.getvalue() | ||
|
|
||
| # Write to individual log file | ||
| log_filename = f"test_{test_name}.log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this logs/... so the logs are in a subdir for easy deletion (rather than rm test_*.log) or review.
| construct_markdown_summary_table(ordered_results_dict, "test_images_summary.md") | ||
|
|
||
| print("\nTest Summary:") | ||
| print(f"{'Test':<50} {'Total':>10} {'Correct':>10} {'Status':>10}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about using alignment specifiers, for clarity.
|
I've tested Perlmutter and Chrysalis on compute nodes. I don't have recent results from using a Now that I've tested using these commits on Chrysalis, Perlmutter, and Compy and using compute nodes, I think this should be good to merge. |
|
I remembered seeing a deadlock warning on Compy, but I don't recall seeing that on the other machines. In any case, Claude suggested the fixes in 9e2776d for that. I will test on Compy. |
|
The Let me break down how 'forkserver' works: The Three Multiprocessing Methods'fork' (default on Linux/Unix)
'spawn' (default on Windows)
'forkserver' (hybrid approach)Key InsightThe trick is timing:
Visual TimelineWhy It's Faster Than 'spawn'The server process can fork (copy-on-write memory) because it has no threads to worry about. So workers get:
Trade-off'forkserver' has a small one-time startup cost (starting the server), but then each worker creation is much faster than 'spawn' because it uses fork's efficient copy-on-write mechanism. For your use case with 6 workers, 'forkserver' should be significantly faster than 'spawn' while remaining safe. |
|
Yes, |
|
I've tested the new commits on Compy, so now I think this should be good to merge. |
Summary
Objectives:
Issue resolution:
Select one: This pull request is...
Small Change