You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION
& AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
<!-- Thank you for contributing to Safe Synthesizer! -->
# Summary
<!-- Brief description of changes -->
Set the default WANDB_MODE to online. Since this is only for internal
usage and all the internal people, as far as I know, use the online mode
for wandb, it makes sense to switch it.
## Pre-Merge Checklist
<!-- These checks need to be completed before a PR is merged, -->
<!-- but as PRs often change significantly during review, -->
<!-- it's OK for them to be incomplete when review is first requested.
-->
- [ ] New or updated tests for any fix or new behavior
- [x] Updated documentation for new features and behaviors, including
docstrings for API docs.
## Other Notes
<!-- Please add the issue number that should be closed when this PR is
merged. -->
- Closes #<issue>
---------
Signed-off-by: nina-xu <19981858+nina-xu@users.noreply.github.com>
Copy file name to clipboardExpand all lines: script/slurm/README.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,7 @@ Pipeline entrypoints (invoked by Slurm scripts) via uv:
24
24
25
25
- Slurm Cluster Access: Ensure you have access to the Slurm clusters. You can verify this by running `ssh cs-oci-ord-login-01.nvidia.com` in your terminal (VPN connection required). For an introduction to Slurm, see [these onboarding resources](https://confluence.nvidia.com/display/HWINFCSSUP/Onboarding+to+Clusters).
26
26
- An LLM inference endpoint and the API Key: You will need a `NSS_INFERENCE_KEY` to run column classification, if using the default `NSS_INFERENCE_ENDPOINT`. If you do not have one, you can generate it at [build.nvidia.com](https://build.nvidia.com).
27
+
- Weights & Biases API Key: W&B logging is enabled by default (`WANDB_MODE=online`). You will need a `WANDB_API_KEY` — request an account [here](https://confluence.nvidia.com/display/AIALGO/Weights+and+Biases+%28WandB%29+Enterprise+Account). Set `WANDB_MODE=disabled` in `env_variables.sh` to skip W&B.
27
28
- Enroot Credentials: Follow https://confluence.nvidia.com/display/HWINFCSSUP/Using+Containers#UsingContainers-SettingupEnrootCredentials. You should add the lines for all 3 of `nvcr.io`, `authn.nvidia.com`, and `gitlab-master.nvidia.com`.
2) Create your API token file with `NSS_INFERENCE_KEY` and restrict permissions, recommended to inclue`HF_TOKEN` to avoid throttling by HF Hub and, if you're using W&B, `WANDB_API_KEY`:
90
+
2) Create your API token file and restrict permissions. `NSS_INFERENCE_KEY` and `WANDB_API_KEY` are required by default.`HF_TOKEN`is recommended to avoid throttling by HF Hub:
@@ -191,7 +196,7 @@ Consider using a max of 2-3x the current allocation for llmservice_sdg_research
191
196
```bash
192
197
tail -f ${BASE_LOG_DIR}/${EXP_NAME}/slurm_*.out
193
198
```
194
-
- W&B logging: set the `WANDB_MODE` to `online` to additionally log experiment configs and metrics to W&B. Make sure to export your `WANDB_API_KEY` (request an account [here](https://confluence.nvidia.com/display/AIALGO/Weights+and+Biases+%28WandB%29+Enterprise+Account)) in `${LUSTRE_DIR}/.api_tokens.sh`. There is an optional flag `--wandb-project` to specify a W&B project name if you don't want to use the experiment name.
199
+
- W&B logging: `WANDB_MODE`is set to `online` by default to additionally log experiment configs and metrics to W&B. Make sure to export your `WANDB_API_KEY` (request an account [here](https://confluence.nvidia.com/display/AIALGO/Weights+and+Biases+%28WandB%29+Enterprise+Account)) in `${LUSTRE_DIR}/.api_tokens.sh`. There is an optional flag `--wandb-project` to specify a W&B project name if you don't want to use the experiment name.
195
200
196
201
- When running in `two_stage` mode, be mindful not to submit multiple bash commands that run simutaneously because we aren't able to guarantee unique adapter path for each single run. As a result, two runs might be logged as one on W&B.
197
202
@@ -234,7 +239,7 @@ Log directory resolution order (first match wins):
234
239
235
240
### Collect results
236
241
237
-
Use W&B by setting `WANDB_MODE=online` in `env_variables.sh` and add your W&B token to `.api_tokens.sh`.
242
+
W&B is enabled by default with `WANDB_MODE=online` in `env_variables.sh`. Make sure to add your W&B token to `.api_tokens.sh`. Set `WANDB_MODE=disabled` otherwise.
0 commit comments