run scaling studies in parallel on consistent node sets

We discussed this in the Pavilion training on 2/12/2020 and 2/13/2020.

We would like to do a performance scaling study with multiple power-of-2 tests, with each set of tests at a given scale using the same nodes, running in parallel as much as possible. I'll give a small example, but we'd like this to work with arbitrary scales and repetition counts.

Suppose we have a machine with 9 nodes. We want to run a scaling study with 3 repetitions at scales of 1, 2, 4, and 8 nodes, run in independent Slurm jobs, so there are 12 jobs; I'll name each job by its scale and a letter. Each run takes one time unit. The below table show one possible sequence of which job is running on which node, with "X" meaning unrelated jobs.

Time | cn1 | cn2 | cn3 | cn4 | cn5 | cn6 | cn7 | cn8 | cn9
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
1 | 8a | 8a | 8a | 8a | 8a | 8a | 8a | 8a | 1a
2 | 8b | 8b | 8b | 8b | 8b | 8b | 8b | 8b | 1b
3 | 4a | 4a | 4a | 4a | X | X | X | X | X
4 | 8c | 8c | 8c | 8c | 8c | 8c | 8c | 8c | X
5 | 4b | 4b | 4b | 4b | 2a | 2a | X | X | 1c
6 | 4c | 4c | 4c | 4c | 2b | 2b | X | X | X
7 | X | X | X | X | 2c | 2c | X | X | X

This table shows an invalid sequence, because runs of the same size change which nodes they get:

Time | cn1 | cn2 | cn3 | cn4 | cn5 | cn6 | cn7 | cn8 | cn9
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
1 | 8a | 8a | 8a | 8a | 8a | 8a | 8a | 8a | 1a
2 | 1b | 8b | 8b | 8b | 8b | 8b | 8b | 8b | 8b
3 | 4a | 4a | 4a | 4a | 4b | 4b | 4b | 4b | X

Thanks for the training and your hard work on Pavilion 2. Let me know what additional information you need.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run scaling studies in parallel on consistent node sets #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time	cn1	cn2	cn3	cn4	cn5	cn6	cn7	cn8	cn9
1	8a	8a	8a	8a	8a	8a	8a	8a	1a
2	8b	8b	8b	8b	8b	8b	8b	8b	1b
3	4a	4a	4a	4a	X	X	X	X	X
4	8c	8c	8c	8c	8c	8c	8c	8c	X
5	4b	4b	4b	4b	2a	2a	X	X	1c
6	4c	4c	4c	4c	2b	2b	X	X	X
7	X	X	X	X	2c	2c	X	X	X

Time	cn1	cn2	cn3	cn4	cn5	cn6	cn7	cn8	cn9
1	8a	8a	8a	8a	8a	8a	8a	8a	1a
2	8b	8b	8b	8b	8b	8b	8b	8b	1b
3	4a	4a	4a	4a	X	X	X	X	X
4	8c	8c	8c	8c	8c	8c	8c	8c	X
5	4b	4b	4b	4b	2a	2a	X	X	1c
6	4c	4c	4c	4c	2b	2b	X	X	X
7	X	X	X	X	2c	2c	X	X	X

run scaling studies in parallel on consistent node sets #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Time	cn1	cn2	cn3	cn4	cn5	cn6	cn7	cn8	cn9
1	8a	8a	8a	8a	8a	8a	8a	8a	1a
2	8b	8b	8b	8b	8b	8b	8b	8b	1b
3	4a	4a	4a	4a	X	X	X	X	X
4	8c	8c	8c	8c	8c	8c	8c	8c	X
5	4b	4b	4b	4b	2a	2a	X	X	1c
6	4c	4c	4c	4c	2b	2b	X	X	X
7	X	X	X	X	2c	2c	X	X	X