Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling sbatch file re-use. #1739

Open
alexnwang opened this issue Jul 7, 2023 · 2 comments
Open

Enabling sbatch file re-use. #1739

alexnwang opened this issue Jul 7, 2023 · 2 comments

Comments

@alexnwang
Copy link

I'm interested in re-using sbatch files to re-submit jobs that have crashed. However, the .sh SBATCH file and the .pkl file all are tied to a single SLURM_JOBID. This makes re-using the .sh file to re-launch a job infeasible.

It'd be appreciate if there could be some way to relax this requirement and not have it tied to the JOBID.

@gwenzek
Copy link
Contributor

gwenzek commented Sep 19, 2023

It's also a long standing painpoint for me, but I need to think a bit more about this.
The job id thing is useful because it means that sacct and squeue information is directly relatable to the on disk files.
But it means that restarting is a pain.

The nicest way would be to modify the sbatch file itself so that you can run sbatch several times on it.
One workaround would be to have a CLI submitit restart 102984 that would restart a previous submitit job file.

@alexnwang
Copy link
Author

Yeah, I just setup my directories such that if I submitted a job using the exact same parameters again it'll run in the same dir. Running out of the same dir will just have it pick up where it left off and have another set of submitit files corresponding to the re-run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants