Commit a0b614b
privacy-filter: cap GPU memory + release cache to stop VRAM leak
privacy-filter is an inline HF Transformers token-classification server
(`pipeline(..., device_map="auto")`) with no memory bound. Under steady
traffic the CUDA caching allocator's reserved memory ratchets up and is
never released, so the process slowly hoards the GPU it shares with
Qwen3-VL, FLUX, embeddings, reranker and whisper (GPU 7). Observed ~93 GB
held on an H200 for a model that needs ~1-2 GB.
As privacy-filter fills the card (free ~50 GB -> ~0 over 1-2 days) the
largest co-tenant, Qwen3-VL (~49 GB at --gpu-memory-utilization 0.35),
can no longer load and crash-loops with
`torch.AcceleratorError: CUDA error: out of memory`. The same leak OOM'd
embeddings/whisper on 2026-05-25. Hits both small-models hosts (gpu11,
gpu02) since they run identical config.
Fix (inline server + container env):
- empty_cache() after every request (core fix): returns cached-but-unused
CUDA blocks to the driver so reserved memory stops ratcheting.
- set_per_process_memory_fraction(GPU_MEM_FRACTION, 0) (fail-safe): hard
ceiling so the process self-OOMs/restarts instead of starving neighbours.
Default 0.10 (~14 GB on a 140 GB H200), env-tunable.
- torch.inference_mode() around inference: no autograd state retained.
Interim mitigation already applied by recreating the container, which
frees the leaked VRAM but recurs in ~1-2 days; this makes it permanent.
Ship via the normal tag + compose/up redeploy of small-models.yaml.1 parent f8ad79e commit a0b614b
1 file changed
Lines changed: 48 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
| 196 | + | |
196 | 197 | | |
197 | 198 | | |
198 | 199 | | |
| |||
201 | 202 | | |
202 | 203 | | |
203 | 204 | | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
204 | 215 | | |
205 | 216 | | |
206 | 217 | | |
| |||
231 | 242 | | |
232 | 243 | | |
233 | 244 | | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
257 | 278 | | |
258 | 279 | | |
259 | 280 | | |
| |||
263 | 284 | | |
264 | 285 | | |
265 | 286 | | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
266 | 291 | | |
267 | 292 | | |
268 | 293 | | |
| |||
0 commit comments