Commit 4d945fd
perf: add DeepGEMM cache, multithreaded loading, and context length limit
- Mount deepgemm_cache volume to persist JIT-compiled kernels across restarts
- Add --model-loader-extra-config for multithreaded model loading (64 threads)
- Set --context-length 202000 to avoid EAGLE off-by-two crash near max pos embeddings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 3352b5e commit 4d945fd
1 file changed
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
89 | 91 | | |
90 | 92 | | |
| 93 | + | |
91 | 94 | | |
92 | 95 | | |
93 | 96 | | |
| |||
111 | 114 | | |
112 | 115 | | |
113 | 116 | | |
| 117 | + | |
114 | 118 | | |
115 | 119 | | |
116 | 120 | | |
| |||
0 commit comments