Skip to content

Commit 7b1a296

Browse files
committed
Upgraded to version 0.9.4
1 parent 59b7546 commit 7b1a296

13 files changed

Lines changed: 148 additions & 27 deletions

File tree

README.md

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,10 @@
55
 [![Supported Python Versions](https://img.shields.io/badge/python-3.8+-blue)](https://pypi.org/project/rich/)
66
 [![Twitter Follow](https://img.shields.io/twitter/follow/maharshigor.svg?style=social)](https://twitter.com/maharshigor)
77

8-
98
A _makeshift_ toolkit, built on top of [submitit](https://github.com/facebookincubator/submitit), to launch SLURM jobs over a range of hyperparameters from the command line. It is designed to be used with existing Python scripts and interactively monitor their status.
109

11-
1210
__`submititnow` provides two command-line tools:__
11+
1312
* `slaunch` to launch a Python script as SLURM job(s).
1413
* `jt` (job-tracker) to interactively monitor the jobs.
1514

@@ -23,27 +22,39 @@ Let's say you have a Python script [`examples/annotate_queries.py`](examples/ann
2322
python examples/annotate_queries.py --model='BERT-LARGE-uncased' \
2423
--dataset='NaturalQuestions' --fold='dev'
2524
```
25+
2626
You can launch a job that runs this script over a SLURM cluster using the following:
27+
28+
```bash
29+
slaunch examples/annotate_queries.py \
30+
--mem="16g" --gres="gpu:rtxa4000:1" \
31+
--model='BERT-LARGE-uncased' --dataset='NaturalQuestions' --fold='dev'
32+
```
33+
34+
You can put all the slurm params in a config file and pass it to `slaunch` using `--slurm_config` flag. For example, the above command can be written as:
35+
2736
```bash
2837
slaunch examples/annotate_queries.py \
29-
--slurm_mem="16g" --slurm_gres="gpu:rtxa4000:1" \
38+
--config="examples/configs/gpu.json" \
3039
--model='BERT-LARGE-uncased' --dataset='NaturalQuestions' --fold='dev'
3140
```
3241

3342
### __Launching multiple jobs with parameter-sweep__
3443

3544
```bash
3645
slaunch examples/annotate_queries.py \
37-
--slurm_mem="16g" --slurm_gres="gpu:rtxa4000:1" \
46+
--config="examples/configs/gpu.json" \
3847
--sweep fold model \
3948
--model 'BERT-LARGE-uncased' 'Roberta-uncased' 'T5-cased-small' \
4049
--dataset='NaturalQuestions' --fold 'dev' 'train'
4150
```
51+
4252
This will launch a total of 6 jobs with the following configuration:
4353

4454
![Slaunch Terminal Response](docs/imgs/slaunch_annotate_queries.png)
4555

4656
### __Any constraints on the target Python script that we launch?__
57+
4758
The target Python script must have the following format:
4859

4960
```python
@@ -68,31 +79,35 @@ if __name__ == '__main__':
6879

6980
```
7081

71-
## **`jt`** :   Looking up info on previously launched experiments:
82+
## __`jt`__ :   Looking up info on previously launched experiments:
7283

7384
As instructed in the above screenshot of the Launch response, user can utilize the `jt` (short for `job-tracker`) command to monitor the job progress.
7485

75-
### **`jt jobs EXP_NAME [EXP_ID]`**
86+
### __`jt jobs EXP_NAME [EXP_ID]`__
7687

7788
Executing `jt jobs examples.annotate_queries 227720` will give the following response:
7889

7990
![jt jobs EXP_NAME EXP_ID Terminal Response](docs/imgs/jt_annotate_queries_expid.png)
8091

8192
In fact, user can also lookup all `examples.annotate_queries` jobs simply by removing `[EXP_ID]` from the previous command:
82-
```
93+
94+
```bash
8395
jt jobs examples.annotate_queries
8496
```
97+
8598
![jt jobs EXP_NAME Terminal Response](docs/imgs/jt_annotate_queries.png)
8699

87-
### **`jt {err, out} JOB_ID`**
100+
### __`jt {err, out} JOB_ID`__
101+
88102
__Looking up stderr and stdout of a Job__
89103

90104
Executing `jt out 227720_2` reveals the `stdout` output of the corresponding Job:
91105

92106
![jt out JOB_ID Terminal Response](docs/imgs/jt_out_job_id.png)
93107
Similarly, `jt err 227720_2` reveals the `stderr` logs.
94108

95-
### **`jt sh JOB_ID`**
109+
### __`jt sh JOB_ID`__
110+
96111
__Looking up SBATCH script for a Job__
97112

98113
The submitit tool internally creates an SBATCH shell script per experiment to launch the jobs on a SLURM cluster. This command outputs this `submission.sh` file for inspection.
@@ -102,23 +117,27 @@ Executing `jt sh 227720_2` reveals the following:
102117
![jt out JOB_ID Terminal Response](docs/imgs/jt_sh_job_id.png)
103118

104119
### **`jt ls`**
120+
105121
Finally, user can use `jt ls` to simply list the experiments maintained by the `submititnow` tool.
106122

107-
<img src="docs/imgs/jt_ls.png" width=30%>
123+
![jt_ls](docs/imgs/jt_ls.png)
108124

109125
The experiment names output by this command can then be passed into the `jt jobs` command.
110126

111127
## __Installing__
128+
112129
Python 3.8+ is required.
113130

114131
```bash
115132
pip install -U git+https://github.com/maharshi95/submititnow.git
116133
```
117134

118135
## **Experiment API**
136+
119137
Sometimes the `slaunch` command-line tool is not enough. For example, one may want to launch a job with customized parameter-sweep configurations, or vary a certain parameter (e.g. `output_filepath`) for each job in the launch. In such cases, one can use the Experiment API provided by `submititnow` to launch jobs from Python scripts and also get the benefits of being able to track them with `jt`.
120138

121139
[examples/launch_demo_script.py](examples/launch_demo_script.py) provides a demo of how to use the `Experiment` API to launch a job with customized parameter-sweep configurations.
140+
122141
```bash
123142
python examples/launch_demo_script.py
124143
```

bin/jt

100644100755
File mode changed.

bin/py-srun

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
4+
import json
5+
import subprocess
6+
import argparse
7+
8+
from submititnow.umiacs.handlers import profile_handlers
9+
10+
parser = argparse.ArgumentParser()
11+
parser.add_argument("config", type=str)
12+
parser.add_argument("shell", nargs="+", default="zsh")
13+
args = parser.parse_args()
14+
15+
16+
def removeprefix(var: str, prefix: str):
17+
return var[len(prefix) :] if var.startswith(prefix) else var
18+
19+
20+
def load_config(config_filename: str):
21+
with open(config_filename) as f:
22+
config = json.load(f)
23+
if "profile" in config:
24+
profile = config.pop("profile")
25+
config = profile_handlers[profile](config)
26+
27+
return {
28+
removeprefix(key, "slurm_").replace("_", "-"): value
29+
for key, value in config.items()
30+
}
31+
32+
33+
cmd_args = load_config(args.config)
34+
35+
36+
# Make Bash command
37+
cmd = "srun"
38+
for key, value in cmd_args.items():
39+
cmd += f" --{key}={value}"
40+
cmd += " --job-name=llms"
41+
shell_cmd = " ".join(args.shell)
42+
cmd += f" --pty {shell_cmd}"
43+
44+
print(cmd)
45+
46+
subprocess.run(cmd, shell=True)

bin/slaunch

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,10 @@ if __name__ == "__main__":
137137
job_desc_function=job_description_function,
138138
submititnow_dir=args.submititnow_dir,
139139
)
140-
experiment.register_profile_handler("clip", handlers.clip_profile_handler)
141-
experiment.register_profile_handler("scavenger", handlers.scavenger_profile_handler)
140+
for name, handler in handlers.profile_handlers.items():
141+
experiment.register_profile_handler(name, handler)
142+
143+
142144

143145
slurm_params = options.get_slurm_params(args)
144146

examples/.config.json

Lines changed: 0 additions & 4 deletions
This file was deleted.

examples/configs/gpu.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"profile": "scavenger",
3+
"gres": "gpu:rtxa4000:1",
4+
"mem": "16G"
5+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"profile": "clip",
3+
"gres": "gpu:1",
4+
"mem": "4G"
5+
}

setup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
setuptools.setup(
99
name="submititnow",
10-
version="0.9.3",
10+
version="0.9.4",
1111
author="Maharshi Gor",
1212
author_email="maharshigor@gmail.com",
1313
description="A package to make submitit easier to use",
@@ -27,6 +27,7 @@
2727
"rich-cli>=1.8.0",
2828
"rich>=12.6.0",
2929
"tqdm>=4.0.0",
30+
"scandir>=1.10.0",
3031
],
3132
python_requires=">=3.8",
3233
)

submititnow/cli.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,12 @@
1313

1414

1515
def show_file_content(filepath: str):
16-
rich_print("[bold bright_yellow]Reading file:[/bold bright_yellow] [bold cyan]{}[/bold cyan]\n".format(filepath))
17-
with open(filepath, "r", newline='') as fp:
16+
rich_print(
17+
"[bold bright_yellow]Reading file:[/bold bright_yellow] [bold cyan]{}[/bold cyan]\n".format(
18+
filepath
19+
)
20+
)
21+
with open(filepath, "r", newline="") as fp:
1822
text = fp.read()
1923
for line in text.split("\n"):
2024
line_buffer = io.StringIO()
@@ -81,7 +85,7 @@ def _display_job_submission_status_on_console(exp: Experiment, wait_until: str):
8185
rich_print(f"\t:ledger: "
8286
f"Submitit logs : {exp.logs_dir}\n")
8387

84-
rich_print(f"[bold yellow] Execute the following command to monitor the jobs:[/bold yellow]\n")
88+
rich_print("[bold yellow] Execute the following command to monitor the jobs:[/bold yellow]\n")
8589
rich_print(f"\t[bold bright_white]jt jobs {exp.exp_name} {exp.exp_id}[/bold bright_white]\n")
8690
# fmt: on
8791

submititnow/experiment_lib.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ def __init__(
2222
job_desc_function: Optional[Callable] = None,
2323
submititnow_dir: Optional[str] = None,
2424
):
25-
2625
self.submititnow_dir = (
2726
Path(submititnow_dir) if submititnow_dir else utils.SUBMITITNOW_ROOT_DIR
2827
)
@@ -90,7 +89,6 @@ def launch(
9089
)
9190

9291
if slurm_profile := slurm_params.get("slurm_profile"):
93-
9492
del slurm_params["slurm_profile"]
9593

9694
if slurm_profile in self.profile_handlers:

0 commit comments

Comments
 (0)