Feature/slurm remote support#250
Draft
mhrtmnn wants to merge 30 commits into
Draft
Conversation
Job scheduling logic shall be moved from {Compose,HLS}-Task into the
Slurm object. For this, some additional information is required.
Preamble copies all files that are required for the current job to the SLURM node, postamble copies all generated artefacts back from node.
The absolute path to both scripts may be supplied via the key "PreambleScript" resp. "PostambleScript" in the SLURM JSON cfg file.
f1e7781 to
b7c0a20
Compare
Previously, a job would be broken into its tasks, and a new tapasco job would be created for each task. These jobs were then executed on the SLURM cluster. Refactor this, such that the original job is executed on the SLURM cluster as-is, which simplifies the SLURM logic.
Since SLURM cluster now processes whole jobs (instead of single tasks), dependencies (preamble) and produced artefacts (postamble) of multiple platform/architecture pairs may need to be transferred.
Contributor
Author
|
Executing Tapasco in SLURM mode, e.g. Tapasco can be installed via a SLURM job script like the following: Note: Building toolflow via |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request extends the SLURM support of tapasco such that remote compute nodes can be used for carrying out HLS and compose jobs.
The required architecture consist of three networked machines:
Host (front end):
Runs a tapasco instance that takes in the user CLI arguments and collects all required files for the selected job (e.g. kernel source files for HLS jobs or IPCores for compose jobs). These dependencies are copied over the network to a separate node referred to as
Workstation. The artefacts that are generated by a job (e.g. IPCore for HLS, bitstream for compose) are copied back to the Host once the job finishes.Workstation:
In the simplest case a network attached storage. It is required, since in the general case we cannot directly push files to the SLURM compute node. Thus, the files are deposited in a known directory on this node, and the SLURM compute node can pull the files from here by itself.
SLURM node (back end):
Login node to the compute node that has SLURM control tools such as
sbatchandsqueueinstalled. The compute node runs its own tapasco instance.The above setup is configurable through a JSON config file. This PR contains an example file at
toolflow/vivado/common/SLURM/ESA.jsonthat describes an ESA internal compute node. Different configurations can be selected via tapasco CLI options at the Host, for example--slurm ESA.