Description
Hi, I am using ReferenceFileSystem
as a sort of virtual filesystem, similar to how you have described it in stackoverflow - Does fsspec support virtual filesystems such as pyfileysystem.
It works great for my use-case, but I have encountered an issue - the _open
API reads the entire file instead of streaming it.
filesystem_spec/fsspec/implementations/reference.py
Lines 1102 to 1104 in 30af5e1
This behaviour is expected and is documented as such:
filesystem_spec/fsspec/implementations/reference.py
Lines 597 to 599 in 30af5e1
I’m curious if there’s a specific reason _open
was implemented to load the entire file instead of allowing for streaming access. Could it be that I’m misusing ReferenceFileSystem
? If not, I’d be happy to work on a PR to implement streaming support. Let me know if this would be useful!
EDIT: I'm basically using it as follows, for pyarrow to preserve partitioning format that it infers from filepath.
path = "s3://bucket/parquets/first_name=Alice/5c6de-0.parquet"
fs = ReferenceFileSystem(fo={path: ["/path/to/a/local/cache"]}
ds = dataset(path, filesystem=fs, **kwargs)