This repository holds all the scripts (in python) and data that were used for data analysis and figure creation for the the Reward Competition Extension project, as well as various other useful things. Full disclosure: this repository was reorganized with no double checking about dependencies in terms of calling data or other code; so be aware that there could be an issue while running things.
For a Tutorial of How Most of the EPhys Data was Processed, go to General_ephys_notebooks\Tutorial_for_multirecording_spikeanalysis.ipynb
I recommend downloading the whole repo and running the notebook step by step in the base directory, so you can see how the multirecording_spikeanalysis.py script works.
-
Base Directory:
multirecording_spikeanalysis.py: Current version of Padilla-Coreano Lab ephys scriptmultirecording_spikeanalysis_edit.py: Lots of minor spacing edits and added 'create_spiketrain_df' functionmultirecording_spikeanalysis_edit2.py: Same asmultirecording_spikeanalysis_old.py, but added 'create_spiketrain_df' functionmultirecording_spikeanalysis_edit3.py: Same asmultirecording_spikeanalysis_edit2.py, but 1 line edit to 'create_spiketrain_df' functionmultirecording_spikeanalysis_old.py: Original ephys script I based the edits off ofspikeanal.py: Similar tomultirecording_spikeanalysis_old.pybut also:- spacing edits
- w_assessment function edited from
except TypeError: 'NaN'toelse: 'not significant' - smoothing_window defaults to 250 instead of None
rce_pilot_2_per_video_trial_labels.xlsx: Spreadsheet of event outcomes (e.g.: win/lose/rewarded) and timestamps for Cohort 2rce_pilot_3_alone_comp_per_video_trial_labels.xlsx: Spreadsheet of event outcomes (e.g.: win/lose/rewarded) and timestamps for Cohort 3combined_excel_file.xlsx: Python merge of 2 behavior spreadsheets (fromGeneral_ephys_notebooks\Merge_spreadsheets.ipynb)ms conversion.txt: Notes on how various timestamps relate to each other
-
Behavioral_clustering: Leo created an unsupervised ML clustering from 30s windows (10s before, 10s during, 10s after competitions) of SLEAP data (velocity, position, direction of both mice). The project notebooks under this folder tried to analyze how many/which units were responsive to each cluster. Some of the clusters are characterized by both mice being at the port, and some clusters are characterized by one mouse at the reward port and the other in the corner facing away. There should probably be mPFC units that are responsive to specific states like those.- The only analysis completed in this project so far is the transition proability matrix, but that doesn't use epys data, just timestamps of clusters. I was still trying to figure out the best way to work with the data and how exactly to ask the question of what units change their firing rate in response to each cluster, partially because each occurrence of each cluster is a variable length, and should the comparitive baseline be the 30s window in question, all 30s windows, or the whole recording, and with or without including that cluster. You could potentially make one long array of each cluster and compare them to each other? but they would also be different lengths. I think the best baseline would be during the 30s windows excluding the cluster in question.
- The other issue that was raised in this analysis was the duration of each cluster and the duration between clusters. Because we know the windows are 30s long, I attempted to create windows by finding the earliest timestamp, adding 30,100 ms to that, consider that the first window, then start the next window at the next timestamp. I believe I was pretty successful in this. Also, because some clusters were so short there is a function process_timestamps_nested which uses 2 other functions combine_intervals (if cluster 1 occurs and less than 250 ms later it occurs again, I assumed that would be considered noise and merged those 2 timestamp ranges into 1 long timestamp range) & remove_short_intervals (if a cluster has a duration of <250 ms, it is considered noise and is dropped). 250 ms was an arbitrary but agreed upon duration by Tyler, Meghan, & Nancy.
- After the noise has been reduced and the windows have been created, there are still gaps in between cluster timestamp ranges (see valid_differences_array in Test_funcs.ipynb). The most common duration between clusters by far is 69-70 ms. I believe the clustering was done on every 3rd video frame to make the process quicker, and if the video frames were ~34 ms, then the 69-70 ms gaps are just the 2 frames that were dropped between cluster end and start times. There are also plenty of 200-500 ms gaps, that I don't know how to explain. Then the 20-80 s gaps between clusters are just between windows.
Transition Matrix Example.xlsx: An example of howTransition_prob_matrix.ipynbworksrce_pilot_3_alone_comp_cluster_ranges.pkl: A pickled df of the timestamps of each cluster. Each row is a subject's recording, and there are 5 columns of timestamp dictionaries. I mainly used 'cluster_timestamps_ranges_dict'- cluster_index_ranges_dict: I believe is video frame, not really useful
- cluster_times_ranges_dict, & trial_cluster_times_ranges_dict: I don't understand what these dicts are
- cluster_timestamps_ranges_dict, & trial_cluster_timestamps_ranges_dict: These are the ephys 20 KHz timestamps, so to get to ms you divide by 20 (I use floor division). The difference between these 2 dicts/columns is that cluster... gives timestamps for each behavioral cluster, but trial_cluster... further divides the clusters between win/lose/tie, because for example, if 'cluster 7' was typcially characterized by 1 mouse near the port and 1 mouse in the back corner, the cluster is defined by the scene, not the subject, but the neuronal activity would be expected to be completely different between the 2 mice, so because the rows are subject specific, the 2nd dict tells you which mouse your current subject was in that cluster. Ex: 'win_7' & 'lose_7' instead of just '7'
Transition_prob_matrix.ipynb: Creates a Transition Probability Matrix where row is origin cluster and column is transitioned cluster. The idea here was, do some clusters typically transition to other clusters? Are some clusters simply transition states between 2 other clusters? Because the occurrence of each cluster isn't equal (Cluster 1 might occur 40 times in a recording while Cluster 2 might only occur 5 times), if we just plotted the observed transitions into a heatmap, it wouldn't tell us much, because rows/columns of the most common clusters would appear hotter than the rest. So, in this notebook, an 'expected probability' matrix is created, then an 'observed count' matrix is created, then the 'observed count' is converted to an 'observed probability', then an 'observed-expected' matrix is created by subtracting the 2. The next step I'm less convinced about, but I still think is the right way; if a probability goes from 0.05 to 0.15, that is a lot more important than if a probability goes from 0.45 to 0.55, so the 'observed-expected' matrix is converted to a 'proportional observed-expected matrix. This is the only completed notebook in this directoryTest_funcs.ipynb: A place to test functions and the structure of data files
-
General_ephys_notebooks:Tutorial_for_multirecording_spikeanalysis.ipynb: This is the most important notebook in this repo- There are lots of markdown and comments to explain each step of the notebook
- Lines 1-8 are done at the beginning of almost all of my analyses to: import ephys data, import behavior data, create a dict from behavior data, create a class object from ephys data, and to assign the behavior dict to the ephys class object
- The rest of the notebook doesn't provide an example/instruction on how to use the methods from the object to analyze the data, but instead explains what the structure of the object looks like, and how some of the methods work
Alert_Vs_Dispense.ipynb: Separately looked at Cohort 2 Omission & Both_Rewarded recordings to determine if there was a difference in neuronal responses to the first 5 seconds of the 10 second events (tone but no reward) vs the last 5 seconds (after reward dispensed). If neurons are responding to 'winning' during the 10s event, but don't technically win until after the first 5 seconds, should we only be looking at the 2nd half of the event? Or are the neurons potentially deciding to win during the first 5 seconds and we only need to look at the first half? My conclusion was that although the neurons that responded to the first 5 seconds were regularly not the same ones responding to the 2nd 5 seconds, more neurons were responding to the whole 10 seconds than were responding to both halves individually. So if your specific question is what neurons respond to 'winning' or 'losing', it might be most optimal to look at the whole 10s window.- ... Several more. Will update soon to explain the rest.
-
Move_edit_data_notebooks: Scripts for moving data filesDelete_dot_phy.ipynb: After manual curation of spikes with Phy, a.phydirectory is created. This folder is incredibly large with thousands of files and directories and is just temporary storage for Phy. Phy states that these directories can be deleted and a new directory will be made each time Phy is opened. Because this folder was redundant yet made transferring data very difficult, this script goes through every recording and deletes the .phy folder.Move_RCE_Data.ipynb: To use the multirecording_spikeanalysis.py script, all you need from each ephys recording are 3 files: cluster_group.tsv, spike_clusters.npy, spike_times.npy. This script will move just those 3 files and the parent directories (recording that they came from), so you don't have to move the whole 5-8 GB directory for each recording.
-
Neuronal_classifying: Used WaveMAP protocol to determine putative cell types (e.g. pyramidal vs interneuron)- Source: Lee, K., Carr, N., Perliss, A., & Chandrasekaran, C. (2023). WaveMAP for identifying putative cell types from in vivo electrophysiology. STAR Protocols, 4(2), 102320. https://doi.org/10.1016/j.xpro.2023.102320
- Cool results! Uses full spike sorted data (~7GB/recording) and essentially performs UMAP on the waveforms. Waveform is the only input, no spike rates or anything like that, yet it accurately and consistently groups fast-spiking units (putative interneurons) into a single very separate group!
- Open the folder to view its own ReadMe for more information, but its important to note that the data needed for this is very large and not stored on this repo. Also, because UMAP uses randomization, even with setting a seed, the results were slightly different depending on the machine you ran it on, but the overall picture didn't change.
-
Newest_UMAP: Actual notebook and results of the WaveMAP protocol on this data -
leo_poster: All of the scripts to make the single-unit analysis figures for Leo's 2024 GRC poster, as well as the completed figures (although the labels/titles are edited on BioRender)Cohort2+3_Alone_Comp_Venn.ipynb,Cohort2+3_LinePlots.ipynb,Cohort2+3_PiePlots.ipynb: These notebooks use all of Cohort 2 + Alone Comp from Cohort 3 to make the Venn Diagram, Line Plots, & Pie Plots
-
recordings:- All of the processed ephys data. Pre-processed/raw data is ~7GB per recording and stored elsewhere. These data store 20 KHz spike times for every unit in each recording. There are multiple subdirectories depending on the question you're asking. All of the recordings are under /from_cyborg/ but there are also copies in the other directories like /all_non_novel/ which is what I typically used since the analysis scripts currently can't handle recordings with more than 2 mice.
-
rubbish:- Junk drawer. A place to store old/unused scripts etc that I didn't want to actually delete.