Optimize memory management in streaming parsers and encoding by jeffhuen · Pull Request #2 · jeffhuen/RustyCSV

jeffhuen · 2026-02-16T19:07:58Z

Summary

This PR improves memory efficiency across the CSV parsing and encoding pipeline by implementing proactive memory reclamation, reducing thread stack overhead, and optimizing buffer allocation patterns.

Key Changes

Memory Management Improvements

Added shrink_excess() helper function in streaming.rs that reclaims excess vector capacity when it exceeds 4× the current length (with a 1 KiB floor). This prevents long-lived streaming parsers from monotonically growing to peak memory usage and never returning it to the OS.
Applied shrink_excess() consistently across all three streaming parser implementations (StreamingParser, GeneralStreamingParser, GeneralStreamingParserNewlines) in both feed() and take_rows() methods to prevent unbounded memory growth.
Fixed finish() method in all streaming parsers to explicitly release buffers using Vec::new() instead of relying on Vec::clear(), which preserves allocations. Also properly reset partial_row_start and scan_pos.
Fixed reset() method in StreamingParser to use Vec::new() instead of .clear() for actual memory release.

Error Handling

Enhanced BufferOverflow error type with thiserror::Error derive macro and proper error message, making it idiomatic for library code.
Added #[must_use] attributes to BufferOverflow and all streaming parser structs to encourage proper error handling and prevent accidental ignoring of parser instances.

Performance Optimizations

Reduced thread pool stack size from 8 MiB (default) to 2 MiB in parallel.rs. CSV field extraction has shallow call stacks, and the default wastes ~48 MiB of virtual memory across 8 persistent threads.
Optimized encode_string_parallel() in lib.rs to eliminate unnecessary intermediate vector allocations by directly writing to the output buffer based on encoding/quoting requirements, reducing memory pressure during parallel encoding.

Iterator Improvements

Implemented ExactSizeIterator for RowIter, RowFieldIter, and FieldIter in simd_index.rs to provide size hints and enable more efficient iteration patterns.
Added #[must_use] attributes to query methods (available_rows(), has_partial(), buffer_size(), row_count(), max_pattern_len()) and data structures to prevent accidental ignoring of important information.

Dependencies

Added thiserror = "2" for idiomatic error type derivation.

Implementation Details

The memory optimization strategy focuses on three areas:

Capacity reclamation: The shrink_excess() function uses a conservative threshold (4×) to avoid thrashing on small buffers while still reclaiming significant excess capacity.
Explicit deallocation: Using Vec::new() instead of .clear() ensures memory is actually returned to the OS, critical for long-running streaming parsers.
Stack efficiency: Reducing thread stack size from 8 MiB to 2 MiB is safe for CSV parsing's shallow call stacks and significantly reduces virtual memory overhead.

https://claude.ai/code/session_01QdJE1Gks1uipLWVupAwrbe

Streaming parsers (StreamingParser, GeneralStreamingParser, GeneralStreamingParserNewlines) had three memory leak patterns: 1. compact_buffer() used Vec::drain() which preserves peak allocation capacity even after removing most data. Added shrink_excess() to reclaim memory when capacity exceeds 4x length. 2. take_rows() used drain().collect() leaving complete_rows at peak capacity. Now shrinks after draining. 3. finalize() left the internal buffer allocated after extracting the final rows. Now releases buffer memory since parsing is complete. Also: - Reduce rayon thread pool stack from 8 MiB to 2 MiB per thread (saves ~48 MiB virtual memory across 8 persistent threads) - Remove unnecessary field.clone() in parallel encoder's encoding path - Add ExactSizeIterator impls for RowIter, RowFieldIter, FieldIter All 95 tests pass. https://claude.ai/code/session_01QdJE1Gks1uipLWVupAwrbe

- Add thiserror for BufferOverflow: implements Display + Error traits as required for idiomatic Rust library error types - Add #[must_use] to key types: StructuralIndex, RowEnd, Newlines, BufferOverflow, StreamingParser, GeneralStreamingParser, GeneralStreamingParserNewlines, GeneralFieldBound, StreamingParserResource - Add #[must_use] to getter methods: available_rows(), has_partial(), buffer_size(), row_count(), max_pattern_len() - Fix import ordering in general.rs to pass cargo fmt - All quality gates pass: cargo fmt, clippy -D warnings, 95 tests https://claude.ai/code/session_01QdJE1Gks1uipLWVupAwrbe

Review fixes for PR #2: - Fix RowIter::next(): increment row_idx in trailing-row branch so ExactSizeIterator::len() returns 0 after exhaustion (was returning 1) - Fix RowFieldIter::next(): same trailing-row row_idx fix - Fix FieldIter::size_hint(): check done flag so len() returns 0 after last field is consumed (was returning 1) - Fix shrink_excess(): use byte-based 1 KiB floor via size_of::<T>() instead of element-count 1024 (doc said bytes, code used elements) - Add ExactSizeIterator tests for RowIter, RowFieldIter, FieldIter covering trailing rows and multi-field exhaustion - Add shrink_excess tests: threshold, floor, ratio, large-element types - Add finalize/reset memory release tests

claude and others added 3 commits February 16, 2026 17:44

jeffhuen merged commit ae72821 into main Feb 16, 2026
3 checks passed

jeffhuen deleted the claude/fix-memory-leaks-connections-1tcOW branch February 16, 2026 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize memory management in streaming parsers and encoding#2

Optimize memory management in streaming parsers and encoding#2
jeffhuen merged 3 commits intomainfrom
claude/fix-memory-leaks-connections-1tcOW

jeffhuen commented Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffhuen commented Feb 16, 2026

Summary

Key Changes

Memory Management Improvements

Error Handling

Performance Optimizations

Iterator Improvements

Dependencies

Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants