Commit 83138a0
fix: catalyst teardown timeout — nvidia RM takes ~160s on GV100
The warm_swap phase (unbinding nvsov from Titan V) triggers nvidia-470
RM teardown which takes ~160s for HBM2 dealloc + falcon halt. The
previous 10s UNBIND_TIMEOUT killed the child process prematurely,
causing the rebind to race with still-running kernel teardown.
- Add CATALYST_TEARDOWN_TIMEOUT (200s) for nvidia RM teardown
- Use extended timeout for catalyst warm_swap unbind
- Raise RPC timeout from 180s to 420s
- Raise HANDOFF_DEADLINE from 150s to 400s
Validated: full catalyst pipeline completes end-to-end on Titan V —
insmod (400ms), BAR0 capture (78,627 registers), warm_swap (~160s),
snapshot persisted, frozen .ko archived, rmmod clean (100ms).
Co-authored-by: Cursor <cursoragent@cursor.com>1 parent dfff34a commit 83138a0
3 files changed
Lines changed: 18 additions & 8 deletions
File tree
- crates
- core/cylinder/src/vfio
- server/src/pure_jsonrpc/handler/dispatch
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
40 | 45 | | |
41 | | - | |
42 | | - | |
43 | | - | |
| 46 | + | |
| 47 | + | |
44 | 48 | | |
45 | 49 | | |
46 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1030 | 1030 | | |
1031 | 1031 | | |
1032 | 1032 | | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
1033 | 1038 | | |
1034 | | - | |
| 1039 | + | |
1035 | 1040 | | |
1036 | 1041 | | |
1037 | 1042 | | |
1038 | 1043 | | |
1039 | 1044 | | |
1040 | 1045 | | |
1041 | | - | |
| 1046 | + | |
1042 | 1047 | | |
1043 | 1048 | | |
1044 | 1049 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1077 | 1077 | | |
1078 | 1078 | | |
1079 | 1079 | | |
1080 | | - | |
1081 | | - | |
1082 | | - | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
1083 | 1084 | | |
1084 | 1085 | | |
1085 | 1086 | | |
| |||
0 commit comments