Skip to content

#7: Fix iteration breakdown, add frequency analysis, and plot dropped nodes #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

cwschilly
Copy link
Contributor

@cwschilly cwschilly commented Mar 26, 2025

Fixes #7
Fixes #8

Also refactors the Python code to be more modular and adds both function timers and a new plot for dropped nodes.

The plot the dropped nodes for previous runs, run from the project dir:

python detection/plot_dropped_nodes.py -s slownode.log -a slownodenanalysis.log -o /path/to/output_dir

Where:

-s <slownode.log>         - The driver output (from slow_node.cc)
-a <slownodeanalysis.log> - The analysis output (from detect_slow_nodes.py)

@cwschilly cwschilly linked an issue Mar 26, 2025 that may be closed by this pull request
@cwschilly
Copy link
Contributor Author

@nlslatt @lifflander Should we ignore the first iteration during analysis?

@nlslatt
Copy link

nlslatt commented Mar 28, 2025

@nlslatt @lifflander Should we ignore the first iteration during analysis?

We should probably do one more than the requested number of iterations and completely throw out the first one. I think the first iteration should not be printed or added to the total time.

@nlslatt
Copy link

nlslatt commented Mar 28, 2025

@cwschilly Are you planning to fix the sensors node name problem as a separate PR or part of this? There are a lot of things that need to be addressed before I can start collecting new data and I'd really like to start collecting it now.

@nlslatt
Copy link

nlslatt commented Mar 28, 2025

It would be helpful if the list of slowest iterations included the iteration number and not just the time.

@cwschilly
Copy link
Contributor Author

@cwschilly Are you planning to fix the sensors node name problem as a separate PR or part of this? There are a lot of things that need to be addressed before I can start collecting new data and I'd really like to start collecting it now.

@nlslatt I can put everything in this PR. Going to try to get everything fixed up by the end of the day--I'll ping you when it's ready

@nlslatt
Copy link

nlslatt commented Mar 28, 2025

@cwschilly Jonathan had asked me for the barrier after the random initialization (i.e., before the for loop within runBenchmark), which is where I've been using one

@cwschilly cwschilly requested a review from nlslatt March 31, 2025 15:29
@nlslatt
Copy link

nlslatt commented Mar 31, 2025

@cwschilly The node names in the sensors files are still incorrect. All nodes have been given the name of the first node in the allocation.

@cwschilly cwschilly changed the title #7: Fix iteration breakdown and analysis #7: Fix iteration breakdown, add frequency analysis, and plot dropped nodes Apr 9, 2025
@cwschilly cwschilly requested a review from nlslatt April 9, 2025 17:11
Comment on lines 26 to 27
parser.add_argument('-a', '--analysis', help='Absolute or relative path to the output from detect_slow_nodes', required=True)
parser.add_argument('-o', '--output', help='Absolute or relative path to the output from detect_slow_nodes', default=None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help needs to distinguish between the original output (to be used as input) and the new output.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new output a directory for the plot? That's not clear above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, sorry about that. -o is the path to where you want to save the plot.

@cwschilly cwschilly self-assigned this Apr 15, 2025
@cwschilly cwschilly requested a review from nlslatt April 15, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read/write CPU freq and sensors before and after benchmark Fix iteration breakdown and analysis
3 participants