Commit 491e4b6
committed
Add broadcast_sigterm_every_n_steps Trainer parameter
Allow users to control how often SIGTERM status is broadcast across
ranks, reducing CPU-GPU sync overhead for fast training loops while
preserving the default every-step behavior.
Closes #214871 parent bb7820f commit 491e4b6
3 files changed
Lines changed: 51 additions & 1 deletion
File tree
- src/lightning/pytorch
- loops
- trainer
- tests/tests_pytorch/loops
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| 97 | + | |
97 | 98 | | |
98 | 99 | | |
99 | 100 | | |
| |||
297 | 298 | | |
298 | 299 | | |
299 | 300 | | |
300 | | - | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
301 | 305 | | |
302 | 306 | | |
303 | 307 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
300 | 301 | | |
301 | 302 | | |
302 | 303 | | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
303 | 311 | | |
304 | 312 | | |
305 | 313 | | |
| |||
447 | 455 | | |
448 | 456 | | |
449 | 457 | | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
450 | 463 | | |
451 | 464 | | |
452 | 465 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
228 | 228 | | |
229 | 229 | | |
230 | 230 | | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
0 commit comments