Skip to content

Commit 882bebe

Browse files
committed
Add local caching feature to improve performance for remote file access
1 parent 8f97939 commit 882bebe

File tree

1 file changed

+56
-0
lines changed

1 file changed

+56
-0
lines changed

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,62 @@ with pynwb.NWBHDF5IO(file=h, mode="r") as io:
178178
print(test_timeseries)
179179
```
180180

181+
## Using the Local Cache
182+
183+
LINDI includes a local caching feature that significantly improves performance when accessing remote files by storing frequently accessed data chunks locally. The cache uses SQLite as its storage backend and is particularly beneficial when repeatedly accessing the same remote datasets.
184+
185+
**Basic cache usage**
186+
187+
```python
188+
import lindi
189+
190+
# Create a local cache (defaults to ~/.lindi/cache)
191+
local_cache = lindi.LocalCache()
192+
193+
# Or specify a custom cache directory
194+
local_cache = lindi.LocalCache(cache_dir="/path/to/custom/cache")
195+
196+
# Use the cache when loading remote files
197+
h5_url = "https://api.dandiarchive.org/api/assets/11f512ba-5bcf-4230-a8cb-dc8d36db38cb/download/"
198+
f = lindi.LindiH5pyFile.from_hdf5_file(h5_url, local_cache=local_cache)
199+
200+
# Subsequent accesses will be much faster due to caching
201+
data = f['some_dataset'][:] # First access: downloads and caches
202+
data = f['some_dataset'][:] # Second access: retrieved from cache
203+
```
204+
205+
**Cache with LINDI files**
206+
207+
The cache can also be used when working with LINDI JSON files that reference remote data:
208+
209+
```python
210+
import lindi
211+
212+
# Create a local cache
213+
local_cache = lindi.LocalCache()
214+
215+
# Load a LINDI file with caching enabled
216+
f = lindi.LindiH5pyFile.from_lindi_file('example.nwb.lindi.json', local_cache=local_cache)
217+
218+
# Access data - first time will cache, subsequent times will be faster
219+
data = f['processing/ecephys/LFP/LFP/data'][:1000]
220+
```
221+
222+
**How the cache works**
223+
224+
- The cache stores data chunks from remote URLs based on URL, byte offset, and chunk size
225+
- By default, the cache directory is located at `~/.lindi/cache`
226+
- Individual chunks are limited to 900 MB due to SQLite constraints
227+
- The cache persists across Python sessions, so subsequent runs will benefit from previously cached data
228+
- Cache files are automatically created and managed by LINDI
229+
230+
**Cache benefits**
231+
232+
- Dramatically improves performance for repeated access to the same remote datasets
233+
- Reduces network bandwidth usage
234+
- Enables faster iteration when developing and testing code with remote data
235+
- Particularly effective for accessing NWB files from DANDI Archive multiple times
236+
181237
## Notes
182238

183239
This project was inspired by [kerchunk](https://github.com/fsspec/kerchunk) and [hdmf-zarr](https://hdmf-zarr.readthedocs.io/en/latest/index.html).

0 commit comments

Comments
 (0)