All of Tyler's Notebooks & Data

This repository holds all the scripts (in python) and data that were used for data analysis and figure creation for the the Reward Competition Extension project, as well as various other useful things. Full disclosure: this repository was reorganized with no double checking about dependencies in terms of calling data or other code; so be aware that there could be an issue while running things.

For a Tutorial of How Most of the EPhys Data was Processed, go to `General_ephys_notebooks\Tutorial_for_multirecording_spikeanalysis.ipynb`

I recommend downloading the whole repo and running the notebook step by step in the base directory, so you can see how the multirecording_spikeanalysis.py script works.

Directory Structure

Base Directory:
- multirecording_spikeanalysis.py: Current version of Padilla-Coreano Lab ephys script
- multirecording_spikeanalysis_edit.py: Lots of minor spacing edits and added 'create_spiketrain_df' function
- multirecording_spikeanalysis_edit2.py: Same as multirecording_spikeanalysis_old.py, but added 'create_spiketrain_df' function
- multirecording_spikeanalysis_edit3.py: Same as multirecording_spikeanalysis_edit2.py, but 1 line edit to 'create_spiketrain_df' function
- multirecording_spikeanalysis_old.py: Original ephys script I based the edits off of
- spikeanal.py: Similar to multirecording_spikeanalysis_old.py but also:
  - spacing edits
  - w_assessment function edited from except TypeError: 'NaN' to else: 'not significant'
  - smoothing_window defaults to 250 instead of None
- rce_pilot_2_per_video_trial_labels.xlsx: Spreadsheet of event outcomes (e.g.: win/lose/rewarded) and timestamps for Cohort 2
- rce_pilot_3_alone_comp_per_video_trial_labels.xlsx: Spreadsheet of event outcomes (e.g.: win/lose/rewarded) and timestamps for Cohort 3
- combined_excel_file.xlsx: Python merge of 2 behavior spreadsheets (from General_ephys_notebooks\Merge_spreadsheets.ipynb)
- ms conversion.txt: Notes on how various timestamps relate to each other
Behavioral_clustering: Leo created an unsupervised ML clustering from 30s windows (10s before, 10s during, 10s after competitions) of SLEAP data (velocity, position, direction of both mice). The project notebooks under this folder tried to analyze how many/which units were responsive to each cluster. Some of the clusters are characterized by both mice being at the port, and some clusters are characterized by one mouse at the reward port and the other in the corner facing away. There should probably be mPFC units that are responsive to specific states like those.
- The only analysis completed in this project so far is the transition proability matrix, but that doesn't use epys data, just timestamps of clusters. I was still trying to figure out the best way to work with the data and how exactly to ask the question of what units change their firing rate in response to each cluster, partially because each occurrence of each cluster is a variable length, and should the comparitive baseline be the 30s window in question, all 30s windows, or the whole recording, and with or without including that cluster. You could potentially make one long array of each cluster and compare them to each other? but they would also be different lengths. I think the best baseline would be during the 30s windows excluding the cluster in question.
- The other issue that was raised in this analysis was the duration of each cluster and the duration between clusters. Because we know the windows are 30s long, I attempted to create windows by finding the earliest timestamp, adding 30,100 ms to that, consider that the first window, then start the next window at the next timestamp. I believe I was pretty successful in this. Also, because some clusters were so short there is a function process_timestamps_nested which uses 2 other functions combine_intervals (if cluster 1 occurs and less than 250 ms later it occurs again, I assumed that would be considered noise and merged those 2 timestamp ranges into 1 long timestamp range) & remove_short_intervals (if a cluster has a duration of <250 ms, it is considered noise and is dropped). 250 ms was an arbitrary but agreed upon duration by Tyler, Meghan, & Nancy.
- After the noise has been reduced and the windows have been created, there are still gaps in between cluster timestamp ranges (see valid_differences_array in Test_funcs.ipynb). The most common duration between clusters by far is 69-70 ms. I believe the clustering was done on every 3rd video frame to make the process quicker, and if the video frames were ~34 ms, then the 69-70 ms gaps are just the 2 frames that were dropped between cluster end and start times. There are also plenty of 200-500 ms gaps, that I don't know how to explain. Then the 20-80 s gaps between clusters are just between windows.
- Transition Matrix Example.xlsx: An example of how Transition_prob_matrix.ipynb works
- rce_pilot_3_alone_comp_cluster_ranges.pkl: A pickled df of the timestamps of each cluster. Each row is a subject's recording, and there are 5 columns of timestamp dictionaries. I mainly used 'cluster_timestamps_ranges_dict'
  - cluster_index_ranges_dict: I believe is video frame, not really useful
  - cluster_times_ranges_dict, & trial_cluster_times_ranges_dict: I don't understand what these dicts are
  - cluster_timestamps_ranges_dict, & trial_cluster_timestamps_ranges_dict: These are the ephys 20 KHz timestamps, so to get to ms you divide by 20 (I use floor division). The difference between these 2 dicts/columns is that cluster... gives timestamps for each behavioral cluster, but trial_cluster... further divides the clusters between win/lose/tie, because for example, if 'cluster 7' was typcially characterized by 1 mouse near the port and 1 mouse in the back corner, the cluster is defined by the scene, not the subject, but the neuronal activity would be expected to be completely different between the 2 mice, so because the rows are subject specific, the 2nd dict tells you which mouse your current subject was in that cluster. Ex: 'win_7' & 'lose_7' instead of just '7'
- Transition_prob_matrix.ipynb: Creates a Transition Probability Matrix where row is origin cluster and column is transitioned cluster. The idea here was, do some clusters typically transition to other clusters? Are some clusters simply transition states between 2 other clusters? Because the occurrence of each cluster isn't equal (Cluster 1 might occur 40 times in a recording while Cluster 2 might only occur 5 times), if we just plotted the observed transitions into a heatmap, it wouldn't tell us much, because rows/columns of the most common clusters would appear hotter than the rest. So, in this notebook, an 'expected probability' matrix is created, then an 'observed count' matrix is created, then the 'observed count' is converted to an 'observed probability', then an 'observed-expected' matrix is created by subtracting the 2. The next step I'm less convinced about, but I still think is the right way; if a probability goes from 0.05 to 0.15, that is a lot more important than if a probability goes from 0.45 to 0.55, so the 'observed-expected' matrix is converted to a 'proportional observed-expected matrix. This is the only completed notebook in this directory
- Test_funcs.ipynb: A place to test functions and the structure of data files
General_ephys_notebooks:
- Tutorial_for_multirecording_spikeanalysis.ipynb: This is the most important notebook in this repo
  - There are lots of markdown and comments to explain each step of the notebook
  - Lines 1-8 are done at the beginning of almost all of my analyses to: import ephys data, import behavior data, create a dict from behavior data, create a class object from ephys data, and to assign the behavior dict to the ephys class object
  - The rest of the notebook doesn't provide an example/instruction on how to use the methods from the object to analyze the data, but instead explains what the structure of the object looks like, and how some of the methods work
- Alert_Vs_Dispense.ipynb: Separately looked at Cohort 2 Omission & Both_Rewarded recordings to determine if there was a difference in neuronal responses to the first 5 seconds of the 10 second events (tone but no reward) vs the last 5 seconds (after reward dispensed). If neurons are responding to 'winning' during the 10s event, but don't technically win until after the first 5 seconds, should we only be looking at the 2nd half of the event? Or are the neurons potentially deciding to win during the first 5 seconds and we only need to look at the first half? My conclusion was that although the neurons that responded to the first 5 seconds were regularly not the same ones responding to the 2nd 5 seconds, more neurons were responding to the whole 10 seconds than were responding to both halves individually. So if your specific question is what neurons respond to 'winning' or 'losing', it might be most optimal to look at the whole 10s window.
- ... Several more. Will update soon to explain the rest.
Move_edit_data_notebooks: Scripts for moving data files
- Delete_dot_phy.ipynb: After manual curation of spikes with Phy, a .phy directory is created. This folder is incredibly large with thousands of files and directories and is just temporary storage for Phy. Phy states that these directories can be deleted and a new directory will be made each time Phy is opened. Because this folder was redundant yet made transferring data very difficult, this script goes through every recording and deletes the .phy folder.
- Move_RCE_Data.ipynb: To use the multirecording_spikeanalysis.py script, all you need from each ephys recording are 3 files: cluster_group.tsv, spike_clusters.npy, spike_times.npy. This script will move just those 3 files and the parent directories (recording that they came from), so you don't have to move the whole 5-8 GB directory for each recording.
Neuronal_classifying: Used WaveMAP protocol to determine putative cell types (e.g. pyramidal vs interneuron)
- Source: Lee, K., Carr, N., Perliss, A., & Chandrasekaran, C. (2023). WaveMAP for identifying putative cell types from in vivo electrophysiology. STAR Protocols, 4(2), 102320. https://doi.org/10.1016/j.xpro.2023.102320
- Cool results! Uses full spike sorted data (~7GB/recording) and essentially performs UMAP on the waveforms. Waveform is the only input, no spike rates or anything like that, yet it accurately and consistently groups fast-spiking units (putative interneurons) into a single very separate group!
- Open the folder to view its own ReadMe for more information, but its important to note that the data needed for this is very large and not stored on this repo. Also, because UMAP uses randomization, even with setting a seed, the results were slightly different depending on the machine you ran it on, but the overall picture didn't change.
Newest_UMAP: Actual notebook and results of the WaveMAP protocol on this data
leo_poster: All of the scripts to make the single-unit analysis figures for Leo's 2024 GRC poster, as well as the completed figures (although the labels/titles are edited on BioRender)
- Cohort2+3_Alone_Comp_Venn.ipynb, Cohort2+3_LinePlots.ipynb, Cohort2+3_PiePlots.ipynb: These notebooks use all of Cohort 2 + Alone Comp from Cohort 3 to make the Venn Diagram, Line Plots, & Pie Plots
recordings:
- All of the processed ephys data. Pre-processed/raw data is ~7GB per recording and stored elsewhere. These data store 20 KHz spike times for every unit in each recording. There are multiple subdirectories depending on the question you're asking. All of the recordings are under /from_cyborg/ but there are also copies in the other directories like /all_non_novel/ which is what I typically used since the analysis scripts currently can't handle recordings with more than 2 mice.
rubbish:
- Junk drawer. A place to store old/unused scripts etc that I didn't want to actually delete.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

All of Tyler's Notebooks & Data

For a Tutorial of How Most of the EPhys Data was Processed, go to `General_ephys_notebooks\Tutorial_for_multirecording_spikeanalysis.ipynb`

Directory Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Behavioral_clustering		Behavioral_clustering
General_ephys_notebooks		General_ephys_notebooks
Move_edit_data_notebooks		Move_edit_data_notebooks
Neuronal_classifying		Neuronal_classifying
Newest_UMAP		Newest_UMAP
leo_poster		leo_poster
recordings		recordings
rubbish		rubbish
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
ZScore_Compness.ipynb		ZScore_Compness.ipynb
combined_excel_file.xlsx		combined_excel_file.xlsx
ms conversion.txt		ms conversion.txt
multirecording_spikeanalysis.py		multirecording_spikeanalysis.py
multirecording_spikeanalysis_edit.py		multirecording_spikeanalysis_edit.py
multirecording_spikeanalysis_edit2.py		multirecording_spikeanalysis_edit2.py
multirecording_spikeanalysis_edit3.py		multirecording_spikeanalysis_edit3.py
multirecording_spikeanalysis_old.py		multirecording_spikeanalysis_old.py
rce_pilot_2_per_video_trial_labels.xlsx		rce_pilot_2_per_video_trial_labels.xlsx
rce_pilot_3_alone_comp_per_video_trial_labels.xlsx		rce_pilot_3_alone_comp_per_video_trial_labels.xlsx
spikeanal.py		spikeanal.py
temp_behav_clust.ipynb		temp_behav_clust.ipynb
temp_behav_clust2.ipynb		temp_behav_clust2.ipynb

AceTylercholine/npc_playground

Folders and files

Latest commit

History

Repository files navigation

All of Tyler's Notebooks & Data

For a Tutorial of How Most of the EPhys Data was Processed, go to General_ephys_notebooks\Tutorial_for_multirecording_spikeanalysis.ipynb

Directory Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

For a Tutorial of How Most of the EPhys Data was Processed, go to `General_ephys_notebooks\Tutorial_for_multirecording_spikeanalysis.ipynb`

Packages