You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A _makeshift_ toolkit, built on top of [submitit](https://github.com/facebookincubator/submitit), to launch SLURM jobs over a range of hyperparameters from the command line. It is designed to be used with existing Python scripts and interactively monitor their status.
10
9
11
-
12
10
__`submititnow` provides two command-line tools:__
11
+
13
12
*`slaunch` to launch a Python script as SLURM job(s).
14
13
*`jt` (job-tracker) to interactively monitor the jobs.
15
14
@@ -23,27 +22,39 @@ Let's say you have a Python script [`examples/annotate_queries.py`](examples/ann
You can put all the slurm params in a config file and pass it to `slaunch` using `--slurm_config` flag. For example, the above command can be written as:
### __Any constraints on the target Python script that we launch?__
57
+
47
58
The target Python script must have the following format:
48
59
49
60
```python
@@ -68,31 +79,35 @@ if __name__ == '__main__':
68
79
69
80
```
70
81
71
-
## **`jt`** : Looking up info on previously launched experiments:
82
+
## __`jt`__ : Looking up info on previously launched experiments:
72
83
73
84
As instructed in the above screenshot of the Launch response, user can utilize the `jt` (short for `job-tracker`) command to monitor the job progress.
74
85
75
-
### **`jt jobs EXP_NAME [EXP_ID]`**
86
+
### __`jt jobs EXP_NAME [EXP_ID]`__
76
87
77
88
Executing `jt jobs examples.annotate_queries 227720` will give the following response:
78
89
79
90

80
91
81
92
In fact, user can also lookup all `examples.annotate_queries` jobs simply by removing `[EXP_ID]` from the previous command:
82
-
```
93
+
94
+
```bash
83
95
jt jobs examples.annotate_queries
84
96
```
97
+
85
98

86
99
87
-
### **`jt {err, out} JOB_ID`**
100
+
### __`jt {err, out} JOB_ID`__
101
+
88
102
__Looking up stderr and stdout of a Job__
89
103
90
104
Executing `jt out 227720_2` reveals the `stdout` output of the corresponding Job:
91
105
92
106

93
107
Similarly, `jt err 227720_2` reveals the `stderr` logs.
94
108
95
-
### **`jt sh JOB_ID`**
109
+
### __`jt sh JOB_ID`__
110
+
96
111
__Looking up SBATCH script for a Job__
97
112
98
113
The submitit tool internally creates an SBATCH shell script per experiment to launch the jobs on a SLURM cluster. This command outputs this `submission.sh` file for inspection.
@@ -102,23 +117,27 @@ Executing `jt sh 227720_2` reveals the following:
102
117

103
118
104
119
### **`jt ls`**
120
+
105
121
Finally, user can use `jt ls` to simply list the experiments maintained by the `submititnow` tool.
106
122
107
-
<imgsrc="docs/imgs/jt_ls.png"width=30%>
123
+

108
124
109
125
The experiment names output by this command can then be passed into the `jt jobs` command.
Sometimes the `slaunch` command-line tool is not enough. For example, one may want to launch a job with customized parameter-sweep configurations, or vary a certain parameter (e.g. `output_filepath`) for each job in the launch. In such cases, one can use the Experiment API provided by `submititnow` to launch jobs from Python scripts and also get the benefits of being able to track them with `jt`.
120
138
121
139
[examples/launch_demo_script.py](examples/launch_demo_script.py) provides a demo of how to use the `Experiment` API to launch a job with customized parameter-sweep configurations.
0 commit comments