Skip to content

[ENH] Stream large tokenizers to disk instead of loading to memory #42

@tazarov

Description

@tazarov

Enhancement Request

Implement streaming downloads for very large tokenizer files to improve memory efficiency.

Current Behavior

The downloadWithRetry function loads entire tokenizer into memory using ioutil.ReadAll, which could be problematic for very large models.

Proposed Enhancement

  • Stream response body directly to temporary file
  • Atomic move to final location after successful download
  • Configurable memory threshold for switching to streaming mode
  • Progress reporting for large downloads

Implementation Details

  • Use io.Copy with buffer instead of ReadAll
  • Write to temp file in same filesystem for atomic rename
  • Verify checksum while streaming (if available)
  • Clean up temp files on failure

Benefits

  • Reduced memory footprint
  • Support for arbitrarily large tokenizers
  • Better performance for memory-constrained environments
  • Improved reliability for large model downloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions