This repository stores datasets and related resources for experiments and tutorials. It contains curated real-world data, synthetic examples, Jupyter notebooks, and helper scripts.
The goal is to provide a centralized location for datasets used in demos or research. Datasets are organized to keep real data separate from generated examples while also tracking notebooks and scripts that operate on them.
data/
├── real_world/ # datasets collected from external sources
└── synthetic/ # generated data and templates
notebooks/ # exploration and analysis notebooks
scripts/ # utilities for generation or preprocessing
Executed notebooks (e.g., demo_tensors.executed.ipynb) are not tracked. Run the clean notebook to reproduce outputs.
Large dataset files should be stored with Git LFS. Install Git LFS before cloning so binary files are fetched correctly:
git lfs installTrack new file types as needed. For example:
git lfs track "*.csv"This repository tracks common dataset formats with LFS:
git lfs track "*.zarr" "*.h5" "*.npz" "*.safetensors"Commit the resulting .gitattributes file along with your data. If you clone after enabling LFS, run git lfs pull to download the tracked content.
This project is licensed under the MIT License. See the LICENSE file for details.
A Streamlit dashboard is included for exploring the datasets interactively. Launch it with:
streamlit run scripts/dashboard.py