final glm notebook version

camila-maura · camila-maura · commit 67ecc6e6b0cc · 2025-07-10T17:57:41.000-04:00
diff --git a/docs/higher-order/GLM_pynapple_nemos.ipynb b/docs/higher-order/GLM_pynapple_nemos.ipynb
@@ -183,7 +183,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2020,
+   "execution_count": null,
    "id": "cbb34563",
    "metadata": {},
    "outputs": [
@@ -200,14 +200,12 @@
    ],
    "source": [
     "# Install requirements for the databook\n",
-    "'''\n",
     "try:\n",
     "    from databook_utils.dandi_utils import dandi_download_open\n",
     "except:\n",
     "    !git clone https://github.com/AllenInstitute/openscope_databook.git\n",
     "    %cd openscope_databook\n",
-    "    %pip install -e .\n",
-    "'''"
+    "    %pip install -e .\n"
    ]
   },
   {
@@ -475,7 +473,7 @@
    "id": "d0970781",
    "metadata": {},
    "source": [
-    "Here we dont have the brain area information but we need it, so we need to do some preprocessing to extract brain area from the nwb object using the peak_channel_id metadata. Luckily, **Pynapple** stored the nwb object as well."
+    "Here we do not have the brain area information but we need it, so we need to do some preprocessing to extract brain area from the nwb object using the peak_channel_id metadata. Luckily, **Pynapple** stored the nwb object as well."
    ]
   },
   {
@@ -587,7 +585,7 @@
     "# Extract flashes as an Interval Set object\n",
     "flashes = data[\"flashes_presentations\"]\n",
     "\n",
-    "# Remove unnecesary columns, similarly to above\n",
+    "# Remove unnecessary columns, similarly to above\n",
     "cols_to_keep = ['color']\n",
     "restrict_cols(cols_to_keep, flashes)\n",
     "\n",
@@ -1867,14 +1865,10 @@
     "![hankel_matrix](../../data/images/hankel_matrix.gif)\n",
     "\n",
     "Construction of Hankel Matrix. Modified from {cite:t}`PillowCosyneTutorial` <span id=\"cite3\"></span><a href=\"#ref3\">[3]</a>.\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "736a1f3d",
-   "metadata": {},
-   "source": [
+    "\n",
+    "For an example on how to build a design matrix using the raw history as a predictor, see this [GLM notebook](../higher-order/glm.ipynb) or this [**NeMoS** Fit GLMs for neural coupling tutorial](https://nemos.readthedocs.io/en/latest/how_to_guide/raw_history_feature.html#raw-spike-history-as-a-feature).\n",
+    ":::\n",
+    "\n",
     "However, modeling each time lag with an independent parameter leads to a high-dimensional filter that is prone to overfitting (given that we are using a bin size of 0.005, we would end up with 50 lags = 50 parameters per flash color!) A better idea is to do some dimensionality reduction on these predictors, by parametrizing them using basis functions. This will allow us to capture interesting non-linear effects with a relatively low-dimensional parametrization that preserves convexity. \n",
     "\n",
     "The way you perform this dimensionality reduction should be carefully considered. Choosing the appropriate type of basis functions, deciding how many to include, and setting their parameters all depend on the specifics of your problem. It’s essential to reflect on which aspects of the stimulus history are worth retaining and how best to represent them. For instance, do you expect sharp transient responses right after stimulus onset? Or are you more interested in slower, sustained effects?"
@@ -4497,10 +4491,9 @@
     "\n",
     "In particular:\n",
     "\n",
-    "- Be thorough when deciding which units to include or exclude from your analysis. Make sure your criteria are clear, justified, and reproducible. We decided to keep the 15% most responsive units, and used a normalized difference to assess 'responsiveness', but different metrics and alternatives should be considered. For example, instead of choosing the top 15% most responsive, you could plot the responsiveness of all neurons (or whatever metric you decide to use), and decide to keep those who fall between $n$ standard deviations from the mean.\n",
-    "- Explore different ways to split your data. Here, we used train and test data, but you could also try train, validate and test - specially if you will be trying different models and tweaking parameters before finally assessing the performance. Furthermore, we chose to pick one every three flash presentations for the test set, but you could also pick a subset of the stimuli and the counts in other ways.\n",
-    "- Cross-validate the regularizer strength for each neuron individually — using a fixed value across the population may lead to suboptimal fits.\n",
-    "- Think carefully about and cross-validate the basis functions parameters — including the type of basis and the number of components. These choices can greatly influence the model’s performance, and it is important to remember that the basis of choice will force assumptions in your data, so it is key to be aware of those. There is a helpful [**NeMoS** notebook on the topic](https://nemos.readthedocs.io/en/latest/how_to_guide/plot_06_sklearn_pipeline_cv_demo.html) dedicated to tuning basis functions — we encourage you to check it out.\n",
+    "- Explore different ways to split your data. Here, we used train and test data, but you could also try train, validate and test - specially if you will be trying different models and tweaking parameters before finally assessing the performance. Furthermore, different splitting strategies may be needed for different input statistics. For example, picking samples in a random uniform manner may be ideal for independent samples, but not recommended for time series (for which samples close in time are likely highly correlated).\n",
+    "- Cross-validate the regularizer strength for each neuron individually, as using a fixed value across the population may lead to suboptimal fits. For example, the regularizer we used here does a reasonable job at capturing the activity of neurons that are strongly modulated by the flash (see units 1 and 5 in the PSTH of the Stimuli model). However, for neurons with weaker modulation (i.e., smaller changes in firing rate), the model tends to produce flattened predictions, possibly due to over-regularization (see units 4 or 3 in the PSTH of the Stimuli model).\n",
+    "- Think carefully about and cross-validate the basis functions parameters, including the type of basis and the number of components. These choices can greatly influence the model’s performance, and it is important to remember that the basis of choice will force assumptions in your data, so it is key to be aware of those. For example, the raised cosine log stretched basis assumes that the precision of the basis decreases with the distance from the event. This makes the basis great to model rapid changes of the firing rate just after an event, and slow decay back to baseline. This may or may not be the case depending on the dynamics of the neuron you want to fit. There is a helpful [**NeMoS** notebook on the topic](https://nemos.readthedocs.io/en/latest/how_to_guide/plot_06_sklearn_pipeline_cv_demo.html) dedicated to tuning basis functions — we encourage you to check it out.\n",
     "- We made one specific improvement to our model, i.e. adding coupling filters - what do you think would be another reasonable improvement to add? (hint: {cite:t}`pillowSpatiotemporalCorrelationsVisual2008` <span id=\"cite1d\"></span><a href=\"#ref1\">[1d]</a>)"
    ]
   },
@@ -4538,12 +4531,13 @@
     "- [Introduction to GLM - CCN software workshop by the Flatiron Institute](https://flatironinstitute.github.io/neurorse-workshops/workshops/jan-2025/branch/main/full/day2/current_injection.html): for a step by step example of using GLMs to fit the activity of a single neuron in VISp under current injection.\n",
     "- [Neuromatch Academy GLM tutorial](https://compneuro.neuromatch.io/tutorials/W1D3_GeneralizedLinearModels/student/W1D3_Tutorial1.html): for a bit  more detailed explanation of the components of a GLM, slides and some coding exercises to ensure comprehension.\n",
     "- [Jonathan Pillow's COSYNE tutorial](https://www.youtube.com/watch?v=NFeGW5ljUoI&t=4230s): for a longer tutorial of all of the components of a GLM, as well as different types of GLM besides LNP\n",
-    "- [**NeMoS** fit head-direction population tutorial](https://nemos.readthedocs.io/en/latest/tutorials/plot_02_head_direction.html): For a step by step explanation of how to build the design matrix first as a result of convolving the features with the identity matrix, and then by using basis functions, alongside nice visualizations.\n",
-    "- [Flatiron Institute Introduction to GLMs tutorial](https://flatironinstitute.github.io/neurorse-workshops/workshops/jan-2025/branch/main/full/day2/current_injection.html#fitting-the-model): For a detailed explanation, step by step, on how predictors look with and without basis functions, with nice visualizations as well.\n",
-    "- [**NeMoS** notebook on composition of basis functions](https://nemos.readthedocs.io/en/latest/background/basis/plot_02_ND_basis_function.html): For a detailed explanation of the different operations that can be carried out using basis functions in 2 and more dimensions.\n",
+    "- [**NeMoS** Fit GLMs for neural coupling tutorial](https://nemos.readthedocs.io/en/latest/how_to_guide/raw_history_feature.html#raw-spike-history-as-a-feature): for a guide on how to build a design matrix using raw history as a predictor, in the context of setting up a fully coupled GLM to capture pairwise interaction between neurons.\n",
+    "- [**NeMoS** fit head-direction population tutorial](https://nemos.readthedocs.io/en/latest/tutorials/plot_02_head_direction.html): for a step by step explanation of how to build the design matrix first as a result of convolving the features with the identity matrix, and then by using basis functions, alongside nice visualizations.\n",
+    "- [Flatiron Institute Introduction to GLMs tutorial](https://flatironinstitute.github.io/neurorse-workshops/workshops/jan-2025/branch/main/full/day2/current_injection.html#fitting-the-model): for a detailed explanation, step by step, on how predictors look with and without basis functions, with nice visualizations as well.\n",
+    "- [**NeMoS** notebook on composition of basis functions](https://nemos.readthedocs.io/en/latest/background/basis/plot_02_ND_basis_function.html): for a detailed explanation of the different operations that can be carried out using basis functions in 2 and more dimensions.\n",
     "- [Bishop, 2009](https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf): Section 3.1 for a formal description of what basis functions are and some examples of them.\n",
-    "- [**NeMoS** notebook on causal, anti-causal and acausal filters](https://nemos.readthedocs.io/en/latest/background/plot_03_1D_convolution.html#causal-anti-causal-and-acausal-filters): For more information on the convolution occurring with basis functions, and how you can tailor that to your needs.\n",
-    "- [**NeMoS** notebook on conducting cross validation for bases](https://nemos.readthedocs.io/en/latest/how_to_guide/plot_06_sklearn_pipeline_cv_demo.html): For a detailed explanation of how to combine **NeMos** objects within a **scikit-learn** pipeline to select the number of bases and bases type using cross validation."
+    "- [**NeMoS** notebook on causal, anti-causal and acausal filters](https://nemos.readthedocs.io/en/latest/background/plot_03_1D_convolution.html#causal-anti-causal-and-acausal-filters): for more information on the convolution occurring with basis functions, and how you can tailor that to your needs.\n",
+    "- [**NeMoS** notebook on conducting cross validation for bases](https://nemos.readthedocs.io/en/latest/how_to_guide/plot_06_sklearn_pipeline_cv_demo.html): for a detailed explanation of how to combine **NeMos** objects within a **scikit-learn** pipeline to select the number of bases and bases type using cross validation."
    ]
   }
  ],
diff --git a/requirements.txt b/requirements.txt
@@ -25,7 +25,7 @@ quantities==0.14.1
 remfile==0.1.10
 scikit-image==0.18.3
 scipy==1.13.1
-ssm @ git+https://github.com/lindermanlab/ssm
+# ssm @ git+https://github.com/lindermanlab/ssm
 statsmodels==0.14.0
 suite2p==0.14
 tensortools==0.4