perf(file): add buffer_size to File.open() to reduce wasteful pre-reads#6876
Conversation
Greptile SummaryThis PR adds a Confidence Score: 4/5Safe to merge — all findings are P2 style/design concerns with no correctness impact on current callers. Only P2 findings: a design coupling where the same buffer_size value controls both BufReader capacity and the small-file eager-download threshold, and a redundant double-BufReader in image_file_metadata.rs. No P0/P1 issues. All new callers correctly use download_small_files=false so the threshold coupling is benign today. src/daft-file/src/file.rs — the dual use of buf_size on line 75 is worth revisiting if future callers need download_small_files=true with a non-default buffer. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["File.open(buffer_size?)"] --> B["PyDaftFile._from_file_reference(buffer_size?)"]
B --> C["DaftFile::load_blocking(download_small_files=false, buffer_size?)"]
D["PyFileReference.__enter__()"] --> E["DaftFile::load_blocking(download_small_files=true, buffer_size=None→16MB)"]
C --> F{"supports_range?"}
E --> F
F -- "No" --> G["read_full_content() → Memory cursor"]
F -- "Yes, file_size ≤ buf_size & download_small_files" --> G
F -- "Yes, streaming" --> H["BufReader::with_capacity(buf_size) → ObjectReader cursor"]
subgraph call_sites["buffer_size call sites"]
I["MIME sniff / size() → BUFFER_SIZE_SNIFF 4KB"]
J["Image/Audio/Video metadata → BUFFER_SIZE_METADATA 64KB"]
K["Full decode / from_path → None → BUFFER_SIZE_FULL 16MB"]
end
I --> C
J --> C
K --> E
|
Merging this PR will not alter performance
Comparing Footnotes
|
|
|
Changes Made
DaftFile::load()` hardcoded a 16MB BufReader buffer for all calls, so metadata reads (needing ~1KB) and MIME sniffing (needing 16 bytes) each prefetched 16MB.
Added a
buffer_sizeparameter through the full Rust→Python chain, letting each call site specify an appropriate size — internal callers now use 4KB for sniffing and 64KB for metadata. Fully backward-compatible —Nonepreserves existing 16MB default.Related Issues