You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: changelog.md
+21Lines changed: 21 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,27 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
7
7
## [Unreleased]
8
8
9
+
## [0.0.13] - 2025-03-13
10
+
11
+
### Fixed - GPU Memory Leak and Duplicate Model Runs
12
+
13
+
**Critical: Prevents OOM errors and eliminates redundant inference calls**
14
+
15
+
-**GPU Memory Cleanup**: Added `_cleanup_gpu_memory()` function that runs `gc.collect()` and `torch.cuda.empty_cache()` between test iterations to prevent out-of-memory errors with local models
16
+
- Only activates for `transformers` provider (GPU models)
17
+
- Optional verbose mode shows memory stats after cleanup
18
+
19
+
-**Eliminate Duplicate Model Runs**: `analyze_streamed_steps()` now captures the response directly from `FinalAnswerStep.output`, removing the redundant `agent.run()` call
20
+
- Previously, the model was invoked twice per test case — once via streaming and once via `agent.run()`
21
+
- This halves inference time and GPU memory usage per test
22
+
- Function now returns a 4-tuple: `(tools_used, final_answer_called, steps_count, response)`
23
+
24
+
**Files Modified:**
25
+
-`smoltrace/core.py` - GPU cleanup + response capture from streaming
26
+
-`tests/test_core.py` - Updated mocks for 4-tuple return
27
+
-`tests/test_core_additional.py` - Updated mocks for 4-tuple return
28
+
-`tests/test_final_coverage_push.py` - Updated mocks for 4-tuple return
0 commit comments