Spiking neural activity data recorded from rodents

~453B uncompressed tokens of spiking neural activity data recorded from rodents (tokens=neurons x time bins). Unless otherwise noted, the data consist of spike counts in 20 ms time bins recorded from each neuron.

This repository contains the code and instructions for building the dataset from scratch. The actual final dataset is hosted at this public HF repository.

The current component datasets and token counts per dataset are as follows:

Name	Tokens	Source	Details	Species	Subjects	Sessions
VBN	153,877,057,200	dandi:000713	link	mouse	81	153
IBL	69,147,814,139	dandi:000409	link	mouse	115	347
SHIELD	61,890,305,241	dandi:001051	link	mouse	27	99
VCN	36,681,686,005	dandi:000021	link	mouse	32	32
VCN-2	30,600,253,445	dandi:000022	link	mouse	26	26
V2H	24,600,171,007	dandi:000690	link	mouse	25	25
Petersen	15,510,368,376	dandi:000059	link	rat	5	24
Oddball	14,653,641,118	dandi:000253	link	mouse	14	14
Illusion	13,246,412,456	dandi:000248	link	mouse	12	12
Huszar	8,812,474,629	dandi:000552	link	mouse	17	65
Steinmetz	7,881,422,592	dandi:000017	link	mouse	10	39
Le Merre	3,903,005,243	dandi:001260	link	mouse	41	74
Peyrache	2,198,184,372	dandi:000056	link	mouse	7	40
Prince	1,921,336,974	dandi:001371	link	mouse	7	66
Senzai	1,433,511,102	dandi:000166	link	mouse	19	19
Finkelstein	1,313,786,316	dandi:000060	link	mouse	9	98
Grosmark	1,158,299,763	dandi:000044	link	rat	4	8
Giocomo	1,083,328,404	dandi:000053	link	mouse	34	349
Steinmetz-2	684,731,334	figshare:7739750	link	mouse	3	3
Jaramillo	581,535,289	dandi:000986	link	mouse	5	15
Mehrotra	465,402,824	dandi:000987	link	mouse	3	14
Iurilli	388,791,426	dandi:000931	link	mouse	1	1
Gonzalez	366,962,209	dandi:000405	link	rat	5	276
Li	260,807,325	dandi:000010	link	mouse	23	99
Fujisawa	132,563,010	dandi:000067	link	rat	3	10

Total number of tokens: 452,793,851,799

The combined dataset takes up about 453 GB on disk when stored as memory-mapped .arrow files. The HF datasets library uses .arrow files for local caching, so you will need at least this much free disk space in order to be able to utilize it.

Requirements

Please see the auto-generated requirements.txt file.

Creating the component datasets

The data directory contains all the information needed to download and preprocess the individual component datasets and push them to the HF datasets hub (quick links to the subdirectories for component datasets are provided in the Details column in the table above). You can use these as a starting point if you would like to add more datasets to the mix. Adding further dandisets should be particularly easy based off of the current examples. When creating the component datasets, we split long sessions (>10M tokens) into smaller equal-sized chunks of no more than 10M tokens. This makes data loading more efficient and prevents errors while creating and uploading HF datasets.

Merging the component datasets into a single dataset

Once we have created the individual component datasets, we merge them into a single dataset with the merge_datasets.py script. This also shuffles the combined dataset, creates a separate test split (1% of the data), and pushes the dataset to the HF datasets hub (please note that due to the size of the dataset, it can take several hours to push the dataset to the HF datasets hub). If you would like to add more datasets to the mix, simply add their HF dataset repository names to the repo_list in merge_datasets.py.

Visualizing the datasets

visualize_dataset.py provides some basic functionality to visualize random samples from the datasets as a basic sanity check:

python visualize_datasets.py --repo_name 'eminorhan/v2h' --n_examples 9

This will randomly sample n_examples examples from the corresponding dataset and visualize them as below, where x is the time axis (binned into 20 ms windows) and the y axis represents the recorded units:

Users also have the option to visualize n_examples random examples from each component dataset by calling:

python visualize_datasets.py --plot_all --n_examples 9

This will save the visualizations for all component datasets in a folder called rasters as in here.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
rasters		rasters
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_dataset.py		download_dataset.py
estimate_cumulative_probability.py		estimate_cumulative_probability.py
extract_motifs.py		extract_motifs.py
merge_datasets.py		merge_datasets.py
requirements.txt		requirements.txt
tokenize_motifs.py		tokenize_motifs.py
visualize_dataset.py		visualize_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spiking neural activity data recorded from rodents

Requirements

Creating the component datasets

Merging the component datasets into a single dataset

Visualizing the datasets

About

Uh oh!

Releases

Packages

Languages

License

eminorhan/neural-pile-rodent

Folders and files

Latest commit

History

Repository files navigation

Spiking neural activity data recorded from rodents

Requirements

Creating the component datasets

Merging the component datasets into a single dataset

Visualizing the datasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages