Enhancement Request
Implement streaming downloads for very large tokenizer files to improve memory efficiency.
Current Behavior
The downloadWithRetry function loads entire tokenizer into memory using ioutil.ReadAll, which could be problematic for very large models.
Proposed Enhancement
- Stream response body directly to temporary file
- Atomic move to final location after successful download
- Configurable memory threshold for switching to streaming mode
- Progress reporting for large downloads
Implementation Details
- Use io.Copy with buffer instead of ReadAll
- Write to temp file in same filesystem for atomic rename
- Verify checksum while streaming (if available)
- Clean up temp files on failure
Benefits
- Reduced memory footprint
- Support for arbitrarily large tokenizers
- Better performance for memory-constrained environments
- Improved reliability for large model downloads
Enhancement Request
Implement streaming downloads for very large tokenizer files to improve memory efficiency.
Current Behavior
The downloadWithRetry function loads entire tokenizer into memory using ioutil.ReadAll, which could be problematic for very large models.
Proposed Enhancement
Implementation Details
Benefits