Skip to content

Commit 27d9404

Browse files
fix: Revise slurm scripts and update for NSS github (#85)
# Summary Consolidate slurm submission into a single `submit_slurm_jobs.sh` that supports individual dataset experiments, multiple custom datasets, or using the named short and long groups. Also updates instructions and scripts to operate from the github repo version of NSS. - Consolidate `submit_single_dataset.sh` into `submit_slurm_jobs.sh` via new `--dataset-urls` flag, eliminating duplicated submission logic - Replace complex per-config index arithmetic with packed comma-separated arrays (`PACKED_DATASETS`, `PACKED_CONFIGS`) that map 1:1 to `SLURM_ARRAY_TASK_ID` - Simplify `slurm_srun.sh` to a thin `srun` pass-through, relying on `--export=ALL` instead of manually forwarding each variable - Rename `NMP_DIR` to `NSS_DIR` and update all paths to reflect the standalone Safe-Synthesizer repo structure - Add `--time-limit`, `--train-time-limit`, `--generate-time-limit`, `--max-concurrent-slurm-jobs`, `--dry-run`, and `--dataset-urls` CLI flags to `submit_slurm_jobs.sh` - Remove per-config time-limit associative arrays (`CONFIG_TIME_LIMITS_SHORT/LONG`) in favor of explicit CLI flags - Add upfront environment variable validation in `slurm_nss_matrix.sh` with actionable error messages ## Pre-Review Checklist <!-- These checks should be completed before a PR is reviewed, --> <!-- but you can submit a draft early to indicate that the issue is being worked on. --> Ensure that the following pass: - [x] `make format && make lint` or via prek validation. - [ ] `make test` passes locally - [ ] `make test-e2e` passes locally - [ ] `make test-ci-container` passes locally (recommended) ## Pre-Merge Checklist <!-- These checks need to be completed before a PR is merged, --> <!-- but as PRs often change significantly during review, --> <!-- it's OK for them to be incomplete when review is first requested. --> - [ ] New or updated tests for any fix or new behavior - [X] Updated documentation for new features and behaviors, including docstrings for API docs. ## Other Notes <!-- Please add the issue number that should be closed when this PR is merged. --> - Tested with a few end_to_end and two_stage mode slurm runs - Copied from internal MR that was not merged before the fork to this repo - Hand written - Reviewed by cursor agent and hand fixed findings - PR summary generated by cursor agent --------- Signed-off-by: Kendrick Boyd <kendrickb@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 45e74ae commit 27d9404

6 files changed

Lines changed: 401 additions & 739 deletions

File tree

script/slurm/README.md

Lines changed: 92 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,16 @@
33

44
### NeMo Safe Synthesizer Slurm Jobs
55

6-
This directory contains scripts to launch matrix Slurm jobs for NeMo Safe Synthesizer. Jobs are submitted via `submit_slurm_jobs.sh`, which launches a containerized `srun` (`slurm_srun.sh`) that executes the matrix runner (`slurm_nss_matrix.sh`). All paths and defaults are configured in one place: `env_variables.sh`.
6+
This directory contains scripts to launch Slurm jobs for NeMo Safe Synthesizer experimentation.
7+
The contents of this directly are often specific to internal NVIDIA slurm clusters, but shared here as inspiration for others that might be using slurm to do hyperparameter experiments with NeMo Safe Synthesizer.
8+
9+
Jobs are submitted via `submit_slurm_jobs.sh`, which launches a containerized `srun` (`slurm_srun.sh`) that executes the matrix runner (`slurm_nss_matrix.sh`). All paths and defaults are configured in one place: `env_variables.sh`.
710

811
### Files
9-
- `env_variables.sh`: single source of truth for user, paths, configs, and time limits.
10-
- `submit_slurm_jobs.sh`: submits Slurm array jobs for each config and dataset group. Supports two-stage TRAIN→GEN pipeline.
11-
- `submit_single_dataset.sh`: submits jobs for one dataset/path. Supports two-stage TRAIN→GEN pipeline.
12-
- `slurm_srun.sh`: wraps `srun` with container image and mounts.
13-
- `slurm_nss_matrix.sh`: picks dataset/index/run and launches the python entrypoint inside the container. Honors `PHASE=train|generate`.
12+
- `env_variables.sh`: Single source of truth for user, paths.
13+
- `submit_slurm_jobs.sh`: Submits Slurm array jobs for each config and dataset. Supports two-stage TRAIN→GEN pipeline.
14+
- `slurm_nss_matrix.sh`: Picks dataset and config and launches the python entrypoint inside the container. Honors `NSS_PHASE=train|generate|end_to_end`.
15+
- `slurm_srun.sh`: Wraps `srun` with container image and mounts, mostly just a pass through, primary logic is in `submit_slurm_jobs.sh` and `slurm_nss_matrix.sh`.
1416

1517
Pipeline entrypoints (invoked by Slurm scripts) via uv:
1618
- `uv run safe-synthesizer run --run-path <path>` (full end-to-end pipeline)
@@ -21,12 +23,21 @@ Pipeline entrypoints (invoked by Slurm scripts) via uv:
2123

2224
- Slurm Cluster Access: Ensure you have access to the Slurm clusters. You can verify this by running `ssh cs-oci-ord-login-01.nvidia.com` in your terminal (VPN connection required). For an introduction to Slurm, see [these onboarding resources](https://confluence.nvidia.com/display/HWINFCSSUP/Onboarding+to+Clusters).
2325
- NIM API Key: You will need a `NIM_API_KEY` to run column classification. If you do not have one, you can generate it at [build.nvidia.com](https://build.nvidia.com) using your `nvidian` organization account.
24-
- Enroot Credentials Follow https://confluence.nvidia.com/display/HWINFCSSUP/Using+Containers#UsingContainers-SettingupEnrootCredentials. You should add the lines for all 3 of `nvcr.io`, `authn.nvidia.com`, and `gitlab-master.nvidia.com`.
26+
- Enroot Credentials: Follow https://confluence.nvidia.com/display/HWINFCSSUP/Using+Containers#UsingContainers-SettingupEnrootCredentials. You should add the lines for all 3 of `nvcr.io`, `authn.nvidia.com`, and `gitlab-master.nvidia.com`.
27+
- Clone Safe-Synthesizer
28+
```bash
29+
export USER_NAME="$USER" # Or hardcode username in slurm
30+
export LUSTRE_DIR="/lustre/fsw/portfolios/llmservice/users/${USER_NAME}"
31+
cd $LUSTRE_DIR
32+
git clone git@github.com:NVIDIA-NeMo/Safe-Synthesizer.git
33+
cd Safe-Synthesizer
34+
```
2535
- uv and python install in the slurm cluster
26-
- This is a strongly recommended setup, but is not be the only way to get things working.
36+
- DO NOT FOLLOW the general CONTRIBUTING.md or README.md instructions for installation and setup, unless you understand exactly what's being installed where and how that interacts with the distributed nature of a slurm cluster.
37+
- The following setup is strongly recommended, but is not the only way to get things working.
2738
- The key issues about working in slurm we need to address
28-
- /home/$USER is quite small (10 GB) and not recommended for accessing data, easily filled up by uv cache
29-
- Slurm jobs may run in containers with different $HOME (and different users/uids)
39+
- /home/$USER is quite small (10 GB) and not recommended for accessing data (easily filled up by uv cache)
40+
- Slurm jobs may run in containers with different $HOME locations (and different users/uids)
3041
- Thus we put uv and python in your user directory in /lustre and not in /home/$USER
3142
```bash
3243
export USER_NAME="$USER" # Or hardcode username in slurm
@@ -38,8 +49,13 @@ export UV_CACHE_DIR="${LUSTRE_DIR}/.cache/uv"
3849
export UV_PYTHON_INSTALL_DIR="${LUSTRE_DIR}/.local/share/uv/python"
3950
export UV_PYTHON_BIN_DIR="${LUSTRE_DIR}/.local/bin"
4051
export UV_TOOL_DIR="${LUSTRE_DIR}/.local/share/uv/tools"
41-
# Install python 3.11 (as required by NSS) in a location `uv` is aware of
42-
uv python install 3.11
52+
# With the above env vars, the usual make command should work.
53+
# Note this may be quite slow the first time due to very slow network
54+
# connectivity on slurm to download from pypi, but subsequent executions
55+
# (such as startup for your jobs) should be much faster since uv will
56+
# pull cached wheels from UV_CACHE_DIR.
57+
# (Be sure to run from the root of the Safe-Synthesizer repo)
58+
make bootstrap-nss cu128
4359
```
4460

4561
#### Nice to have
@@ -87,14 +103,13 @@ chmod 600 /lustre/fsw/portfolios/llmservice/users/${USER_NAME}/.api_tokens.sh
87103

88104
### Configure
89105
Edit `env_variables.sh` to match your environment. Key items:
90-
- `CONFIGS=(...)`: base names of YAML configs to run (without `.yaml`).
106+
- `CONFIGS=(...)`: base names of YAML configs to run (without `.yaml`), or provide via --config argument to `submit_slurm_jobs.sh`.
91107
- `CONFIG_DIR`: directory where config files live.
92108
- `BASE_LOG_DIR`: where Slurm logs will be written.
93-
- `NMP_DIR`: path to this repository.
109+
- `NSS_DIR`: path to this repository.
94110
- `ADAPTER_PATH`: base path for workdirs (each run creates a subdirectory with adapter, logs, and outputs).
95111
- `VLLM_CACHE_ROOT`, `UV_CACHE_DIR`, `UV_PYTHON_INSTALL_DIR`, `UV_PYTHON_BIN_DIR`, `UV_TOOL_DIR`, `HF_HOME`: cache locations to avoid stressing login nodes.
96112
- `NSS_SHARED_DIR`: location of shared files such as benchmark data and container images, see section below for details.
97-
- Time limits: `CONFIG_TIME_LIMITS_SHORT` and `CONFIG_TIME_LIMITS_LONG` associative maps. Keys are matched by pattern (`unsloth`, `dp`), falling back to `max`.
98113

99114
NSS CLI Environment Variables (used by `safe-synthesizer` CLI via pydantic-settings):
100115
- `NSS_ARTIFACTS_PATH`: Base directory for artifacts (aliased from `ADAPTER_PATH`).
@@ -104,98 +119,88 @@ NSS CLI Environment Variables (used by `safe-synthesizer` CLI via pydantic-setti
104119
- `NSS_LOG_FILE`: Path to log file.
105120

106121
Note: Associative arrays/arrays aren't exported to child processes, so only `submit_slurm_jobs.sh` uses them directly.
122+
When needed, arrays are converted to a comma delimited value in an environment variable to pass through to `slurm_nss_matrix.sh`.
123+
This is used for `PACKED_DATASETS` and `PACKED_CONFIGS` which contain the information for all jobs within the array.
124+
In `slurm_nss_matrix.sh`, each job extracts the dataset and config that it should run based on the `SLURM_ARRAY_TASK_ID` environment variable.
125+
126+
### Submit jobs
127+
128+
129+
Run the submit script (flags are order-independent) from this directory:
107130

108-
### Submit jobs (matrix across dataset groups)
109-
Run the matrix submitter (flags are order-independent) from this directory:
110131
```bash
111-
bash submit_slurm_jobs.sh [--configs c1,c2] [--runs N] [--partition P] [--exp-name NAME] [--dataset-group short|long] [--sleep-sec S] [--pipeline-mode two_stage|end_to_end] [--submit-mode array|sequential] [--wandb-project PROJECT]
132+
bash submit_slurm_jobs.sh [--configs c1,c2] [--dataset-urls name1,url1,path1] [--dataset-group short|long] [--runs N] [--exp-name NAME] [--pipeline-mode two_stage|end_to_end] [--partition P] [--wandb-project PROJECT] [--max-concurrent-slurm-jobs N] [--time-limit TIME] [--train-time-limit TIME] [--generate-time-limit TIME] [--dry-run]
112133

113-
# Example: two-stage (TRAIN→GEN) across "short" datasets (array mode; default)
114-
bash submit_slurm_jobs.sh --exp-name matrix_exp --dataset-group short --runs 1 --partition polar4 --pipeline-mode two_stage
134+
# Example: end_to_end with 2 hour time limit across "short" datasets
135+
bash submit_slurm_jobs.sh --exp-name short_end_to_end --dataset-group short --runs 1 --partition polar4 --pipeline-mode end_to_end --time-limit 2:00:00
115136

116-
# Example: two-stage (TRAIN→GEN) sequential per dataset/run (GEN after TRAIN)
117-
bash submit_slurm_jobs.sh --exp-name matrix_seq --dataset-group short --runs 1 --partition polar4 --pipeline-mode two_stage --submit-mode sequential
137+
# Example: two-stage (TRAIN→GEN) across "short" datasets with 1 hour train time limit and 30 minute generate time limit
138+
bash submit_slurm_jobs.sh --exp-name short_two_stage --dataset-group short --runs 1 --partition polar4 --pipeline-mode two_stage --train-time-limit 1:00:00 --generate-time-limit 0:30:00
118139

119-
# Example: end-to-end (single job per run) sequential per dataset/run
120-
bash submit_slurm_jobs.sh --exp-name matrix_e2e_seq --dataset-group short --runs 1 --partition polar4 --pipeline-mode end_to_end --submit-mode sequential
140+
# Example: Adult data (defined in NVIDIA internal dataset_registry.yaml), three configs, 5 runs each on polar4, use different wandb project from the exp name
141+
bash submit_slurm_jobs.sh \
142+
--dataset-urls adult \
143+
--configs unsloth,dp,dp_usg_guidance \
144+
--runs 5 \
145+
--partition polar4 \
146+
--exp-name regex_adult \
147+
--pipeline-mode two_stage \
148+
--wandb-project other_adult
149+
150+
# Example: arbitrary path/url (not a named dataset from the dataset_registry.yaml), 1 config, 10 runs, with max 3 jobs running at a time
151+
bash submit_slurm_jobs.sh \
152+
--dataset-urls "https://raw.githubusercontent.com/gretelai/gretel-blueprints/refs/heads/main/sample_data/financial_transactions.csv" \
153+
--configs unsloth \
154+
--runs 10 \
155+
--partition polar,polar3,polar4 \
156+
--exp-name financial_repeats \
157+
--pipeline-mode end_to_end \
158+
--max-concurrent-slurm-jobs 3
121159
```
122160

123161
- CONFIGS source: By default, configs come from `CONFIGS=(...)` in `env_variables.sh`. Override with `--configs c1,c2` (base names without `.yaml`).
124-
- RUNS: Number of runs per dataset-config pair.
125-
- PARTITION: Slurm partition to use. See partition info in your cluster docs.
126-
- `EXP_NAME`: Experiment namespace for logs/outputs.
127-
- `DATASET_GROUP`: `short` or `long` (selects built-in dataset sets and time limits).
128-
- `SLEEP_SEC`: Pause between submissions to reduce image import contention.
129-
- `PIPELINE_MODE`: `two_stage` (TRAIN→GEN with dependency) or `end_to_end` (single job).
130-
- `SUBMIT_MODE`: `array` (submit arrays) or `sequential` (submit jobs with dependencies per dataset/run).
131-
- `WANDB_PROJECT`: Name of the Weights & Biases project to track experiments. Defaults to the experiment name if not specified.
132-
133-
How many jobs will run concurrently?
134-
135-
For the built-in `short` group there are currently 17 datasets (see `slurm_nss_matrix.sh`).
136-
- In `two_stage` mode with arrays, the submitter launches one TRAIN array and one GEN array. GEN tasks are linked to corresponding TRAIN tasks via `aftercorr`. Effective max concurrency is cluster/partition limited, but GEN tasks won’t start until their matching TRAIN tasks succeed.
137-
- In `end_to_end` mode, a single array is submitted of size `num_datasets * RUNS * NUM_CONFIGS`.
138-
139-
How long will my jobs take?
162+
- `--runs`: Number of runs per dataset-config pair.
163+
- `--partition`: Slurm partition(s) to use. See partition info in your cluster docs.
164+
- `--exp-name`: Experiment namespace for logs/outputs.
165+
- `--dataset-group`: `short` or `long` (selects built-in dataset sets).
166+
Mutually exclusive with `--dataset-urls`.
167+
- `--dataset-urls`: comma separated value of named datasets from registry, file path, or url
168+
Mutually exclusive with `--dataset-group`.
169+
- `--pipeline-mode`: `two_stage` (TRAIN→GEN with dependency) or `end_to_end` (single job).
170+
- `--wandb-project`: Name of the Weights & Biases project to track experiments.
171+
Defaults to `--exp-name` if not specified.
172+
173+
174+
### How many jobs will run concurrently?
175+
176+
In general, concurrent jobs will depend on the cluster GPU availability and the Fair Share for the PPP.
177+
178+
- In `two_stage` mode, the submitter launches one TRAIN array and one GENERATE array. GENERATE tasks are linked to corresponding TRAIN tasks via `aftercorr`. Effective max concurrency is cluster/partition limited, but GEN tasks won’t start until their matching TRAIN tasks succeed.
179+
- In `end_to_end` mode, a single array is submitted of size `# datasets * runs * # configs`.
180+
181+
The `--max-concurrent-slurm-jobs N` param can be used to further restrict concurrent jobs.
182+
This only restricts within an array, so with end_to_end mode, this will restrict to precisely N simultaneously running jobs.
183+
In two_stage mode, up to 2*N jobs might run, N each from TRAIN arrays and GENERATE arrays.
184+
Using `--max-concurrent-slurm-jobs` is recommended for large experiments to reduce bursting and be friendlier to other users.
185+
Consider using a max of 2-3x the current allocation for llmservice_sdg_research PPP in the cluster to avoid bursting and rapidly dropping our Fair Share for everyone.
186+
187+
### How long will my jobs take?
188+
140189
With `num_input_records_to_sample=25000`
141190
- For the baseline config, the longest job typically finishes within 80 minutes. Total wall time estimate: `60 * RUNS` minutes.
142191
- For the `dp` config, the longest job typically finishes within 120 minutes. Total wall time estimate: `120 * RUNS` minutes.
143192

144193

145194
### Logs and outputs
146-
- Slurm logs: `${BASE_LOG_DIR}/${EXP_NAME}/short|long/<config>/slurm_%A_%a.{out,err}`
195+
- Slurm logs: `${BASE_LOG_DIR}/${EXP_NAME}/slurm_%A_%a.{out,err}`
147196
- You can tail logs while jobs run:
148197
```bash
149-
tail -f ${BASE_LOG_DIR}/${EXP_NAME}/short/<config>/slurm_*.out
198+
tail -f ${BASE_LOG_DIR}/${EXP_NAME}/slurm_*.out
150199
```
151200
- W&B logging: set the `WANDB_MODE` to `online` to additionally log experiment configs and metrics to W&B. Make sure to export your `WANDB_API_KEY` (request an account [here](https://confluence.nvidia.com/display/AIALGO/Weights+and+Biases+%28WandB%29+Enterprise+Account)) in `${LUSTRE_DIR}/.api_tokens.sh`. There is an optional flag `--wandb-project` to specify a W&B project name if you don't want to use the experiment name.
152201

153202
- When running in `two_stage` mode, be mindful not to submit multiple bash commands that run simutaneously because we aren't able to guarantee unique adapter path for each single run. As a result, two runs might be logged as one on W&B.
154203

155-
### One-off single dataset runs
156-
For quick testing of a specific CSV with selected configs and N runs, run from this directory using `submit_single_dataset.sh`.
157-
158-
Usage:
159-
```bash
160-
bash submit_single_dataset.sh --dataset-urls PATH_OR_URL [--configs c1,c2] [--runs N] [--partition P] [--exp-name NAME] [--dataset-group short|long] [--sleep-sec S] [--submit-mode array|sequential] [--pipeline-mode two_stage|end_to_end] [--wandb-project PROJECT]
161-
162-
# Example: Adobe-2k, three configs, 5 runs each on polar4 (array-based two-stage)
163-
bash submit_single_dataset.sh \
164-
--dataset-urls /lustre/fsw/portfolios/llmservice/users/${USER_NAME}/safe-synthetics/cleaned/Adobe-2k.csv \
165-
--configs unsloth,dp,dp_usg_guidance \
166-
--runs 5 \
167-
--partition polar4 \
168-
--exp-name regex_adobe2k \
169-
--dataset-group short \
170-
--sleep-sec 5 \
171-
--submit-mode array \
172-
--pipeline-mode two_stage \
173-
--wandb-project regex_adobe2k
174-
175-
# Example: sequential per-run two-stage (GEN depends on its TRAIN)
176-
bash submit_single_dataset.sh \
177-
--dataset-urls /lustre/fsw/portfolios/llmservice/users/${USER_NAME}/safe-synthetics/cleaned/Adobe-2k.csv \
178-
--configs unsloth,dp \
179-
--runs 3 \
180-
--partition polar4 \
181-
--exp-name demo_exp \
182-
--dataset-group short \
183-
--sleep-sec 3 \
184-
--submit-mode sequential \
185-
--pipeline-mode two_stage
186-
187-
# Tail logs
188-
tail -f /lustre/fsw/portfolios/llmservice/users/${USER_NAME}/nss_results/regex_adobe2k/short/*/slurm_*.out
189-
```
190-
191-
Notes:
192-
- The script honors time limits from `env_variables.sh` based on config name patterns (`unsloth`, `dp`, fallback `max`).
193-
- Set `DATASET_GROUP` to `long` to use the long time limits.
194-
- The dataset path is passed via `DATASET_URLS` and will be used directly by the runner.
195-
- In `two_stage` mode, the TRAIN job creates a workdir at `--run-path` containing the adapter and config. The GEN job resumes from the same workdir and writes uniquely-timestamped output files (e.g., `synthetic_data_20260114T123456.csv`) allowing multiple generation runs from the same trained adapter.
196-
197-
198-
199204
### Monitoring and cancellation
200205
```bash
201206
squeue -u ${USER_NAME}
@@ -207,6 +212,7 @@ scancel <jobid>
207212
Use W&B by setting `WANDB_MODE=online` in `env_variables.sh` and add your W&B token to `.api_tokens.sh`.
208213

209214
### Troubleshooting
215+
210216
- "USER_NAME is not set": run `export USER_NAME=...` and retry.
211217
- Missing token file/key: create `${LUSTRE_DIR}/.api_tokens.sh` with `NIM_API_KEY` and `chmod 600`.
212218
- Missing config files: verify `CONFIGS` in `env_variables.sh` and files in `CONFIG_DIR`.

script/slurm/env_variables.sh

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ export NSS_SHARED_DIR="/lustre/fsw/portfolios/llmservice/users/kendrickb/shared_
1212

1313
## change the followings if you want them to be different
1414
CONFIGS=(unsloth dp dp_usg_guidance) # the jobs will run all datasets with these configs
15-
export NMP_DIR="/lustre/fsw/portfolios/llmservice/users/${USER_NAME}/nmp" # where the nmp repo is located
16-
export NSS_SLURM_DIR="${NMP_DIR}/packages/nemo_safe_synthesizer/script/slurm" # slurm scripts location (inside repo)
15+
export NSS_DIR="/lustre/fsw/portfolios/llmservice/users/${USER_NAME}/Safe-Synthesizer" # where the nss repo is located
16+
export NSS_SLURM_DIR="${NSS_DIR}/script/slurm" # slurm scripts location (inside repo)
1717
export CONFIG_DIR="${NSS_SLURM_DIR}" # where the config files are located
1818
export BASE_LOG_DIR="${LUSTRE_DIR}/nss_results" # where you want the slurm logs to be saved, each job will have err and out files
19-
export ADAPTER_PATH="${LUSTRE_DIR}/nmp/exp/adapters" # base path for run directories (each run creates a subdirectory via --run-path)
19+
export ADAPTER_PATH="${LUSTRE_DIR}/nss_results/adapters" # base path for run directories (each run creates a subdirectory via --run-path)
2020
export VLLM_CACHE_ROOT="${LUSTRE_DIR}/.cache/vllm/" # where the vllm cache is saved, this is to prevent the login node from blowing up
2121
export UV_CACHE_DIR="${LUSTRE_DIR}/.cache/uv"
2222
export UV_PYTHON_INSTALL_DIR="${LUSTRE_DIR}/.local/share/uv/python"
@@ -33,15 +33,3 @@ export WANDB_MODE="disabled" # "online", "offline" or "disabled"
3333
# NSS_LOG_FORMAT - Log format ("json" or "plain")
3434
# NSS_LOG_FILE - Path to log file
3535
export NSS_ARTIFACTS_PATH="${ADAPTER_PATH}"
36-
37-
# time limits for the short and long jobs
38-
declare -A CONFIG_TIME_LIMITS_SHORT
39-
declare -A CONFIG_TIME_LIMITS_LONG
40-
41-
CONFIG_TIME_LIMITS_SHORT[unsloth]="00:40:00"
42-
CONFIG_TIME_LIMITS_SHORT[dp]="02:00:00"
43-
CONFIG_TIME_LIMITS_SHORT[max]="04:00:00" #fallback if config names do not include unsloth or dp
44-
45-
CONFIG_TIME_LIMITS_LONG[unsloth]="01:20:00"
46-
CONFIG_TIME_LIMITS_LONG[dp]="02:00:00"
47-
CONFIG_TIME_LIMITS_LONG[max]="04:00:00" #fallback if config names do not include unsloth or dp

0 commit comments

Comments
 (0)