Merge pull request #440 from kabilar/dandihub

rcpeene · web-flow · commit cdc92f1a360c · 2025-03-09T20:33:03.000-07:00
Update Dandihub instructions
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -21,7 +21,7 @@ The Openscope Databook is a great place to demonstrate the capabilities and some
 As described in [This Jupyter blog post](https://blog.jupyter.org/mybinder-org-reducing-capacity-c93ccfc6413f), Binder no longer has the support of Google, and therefore shows reduced performance. Launches may fail or take a long time. There is no working solution to this except trying to launch again. An alternative would be to launch Databook notebooks with Google Colab.
 
 **How can I store my work on the Databook and come back to it later?**\
-Launching the Databook with [Dandihub](https://hub.dandiarchive.org/) will allow your files to be stored persistently and contain all the Databook's notebooks together. Additionally, you can clone the [GitHub repo](https://github.com/AllenInstitute/openscope_databook) and run our files locally. These are both explained in further detail on the [front page](https://alleninstitute.github.io/openscope_databook/intro.html).
+You can fork the [GitHub repository](https://github.com/AllenInstitute/openscope_databook), and clone the repository to your local machine or [Dandihub](https://hub.dandiarchive.org). Running files locally and on Dandihub are explained in further detail on the [front page](https://alleninstitute.github.io/openscope_databook/intro.html). As you are working, be sure to commit your changes and push them to GitHub.
 
 **How do you recommend using the Databook?**\
 The Databook can be used to reproduce analysis on files, as a starting point for investigating a dataset, or as an educational resource to get more familiar with NWB files or particular kinds of data. In all of these cases, the code can be modified, copied, and interactively run to gain a better understanding of the data. For educational use, the databook may be run remotely with Thebe, Binder, or Google Colab as simple demonstrations. For more advanced usage and analysis, it may behoove you to download an individual notebook and run it locally.
diff --git a/docs/intro.md b/docs/intro.md
@@ -67,7 +67,7 @@ aliases: Carter Peene--R. Carter Peene, colleenjg--Colleen J. Gillon, shailajaAk
 
 Reproducibility is a significant challenge in neuroscience, as analysis and visualization methods are often difficult to replicate due to a lack of accessible code, separation of code from published figures, or unavailability of code altogether. This issue may arise from the complex nature of neuroscience research, the use of diverse data formats and analysis techniques, and insufficient emphasis on open-source, collaborative practices. In addition, key neuroscience analyses are typically rewritten at the start of new scientific projects, slowing down the initiation of research efforts.
 
-Four key components are essential for reproducible analysis: accessible data, accessible computational resources, a reproducible environment, and usage documentation. The OpenScope Databook, provided by the Allen Institute's OpenScope Project, offers a solution to these challenges by facilitating the analysis and visualization of brain data, primarily using [NWB files](https://www.nwb.org/) and the [DANDI archive](https://dandiarchive.org/). Hosted on GitHub, the entire publication – including code, data access, text, references, and revisions from reviewers and contributors – is readily available for collaboration and version control, promoting transparency and collective knowledge growth. The OpenScope Databook addresses these components by leveraging a combination of open-source Python libraries, such as DANDI, [Binder](https://mybinder.org/), [Jupyter Book](https://jupyterbook.org/en/stable/intro.html), [Google Colab](https://colab.research.google.com/), LaTeX references, Python scripts, Git versioning, and scientific revision through approved pull requests. The entire publication can be recreated by running the code locally, on distributed servers such as Binder, [DandiHub](https://hub.dandiarchive.org/), or Google Colab, or on any host running Jupyter notebooks.
+Four key components are essential for reproducible analysis: accessible data, accessible computational resources, a reproducible environment, and usage documentation. The OpenScope Databook, provided by the Allen Institute's OpenScope Project, offers a solution to these challenges by facilitating the analysis and visualization of brain data, primarily using [NWB files](https://www.nwb.org/) and the [DANDI Archive](https://dandiarchive.org/). Hosted on GitHub, the entire publication – including code, data access, text, references, and revisions from reviewers and contributors – is readily available for collaboration and version control, promoting transparency and collective knowledge growth. The OpenScope Databook addresses these components by leveraging a combination of open-source Python libraries, such as DANDI, [Binder](https://mybinder.org/), [Jupyter Book](https://jupyterbook.org/en/stable/intro.html), [Google Colab](https://colab.research.google.com/), LaTeX references, Python scripts, Git versioning, and scientific revision through approved pull requests. The entire publication can be recreated by running the code locally, on distributed servers such as Binder, [DandiHub](https://hub.dandiarchive.org/), or Google Colab, or on any host running Jupyter notebooks.
 
 We cover several broadly used analyses across the community, providing a missing component for system neuroscience. Our key analyses are organized into chapters, including NWB basics such as downloading, streaming, and visualizing NWB files from data archives. We document essential analyses typically performed in all neuroscience laboratories, such as temporal alignment, alignment to sensory stimuli, and association with experimental metadata. We cover the two leading neuronal recording techniques: two-photon calcium imaging and electrophysiological recordings, and share example analyses of stimulus-averaged responses. Advanced first-order analyses include showing receptive fields, identifying optotagged units, current source density analysis, and cell matching across days.
 
@@ -87,10 +87,7 @@ Binder will automatically setup the environment with [repo2docker](https://githu
 [Thebe](https://github.com/executablebooks/thebe) uses Binder in the backend to prepare the environment and run the kernel. It allows users to run notebooks embedded directly within the Databook's web UI. It can be used by hovering over the `Launch` button in the top-right of a notebook and selecting `Live Code`. Thebe is a work-in-progress project and has room for improvement. It is also worth noting that, like Binder, starting the Jupyter Kernel can sometimes take many minutes.
 
 ### Dandihub
-[Dandihub](https://hub.dandiarchive.org/) is an instance of JupyterHub hosted by DANDI. Dandihub does not automatically reproduce the environment required for these notebooks, but importantly, Dandihub allows for persistent storage of your files, so you can leave your work and come back to it later. It can be used by hovering over the `Launch` button in the top-right of a notebook and selecting `JupyterHub`. In order to run notebooks on Dandihub, you must sign in with your GitHub account. To set up the correct environment on Dandihub, open a `terminal` tab, navigate to the directory `openscope_databook` and run the command
-```
-pip install -e .
-```
+[Dandihub](https://hub.dandiarchive.org/) is an instance of JupyterHub hosted by the [DANDI Archive](https://dandiarchive.org). It can be used by hovering over the `Launch` button in the top-right of a notebook and selecting `JupyterHub`. In order to run notebooks on Dandihub, you must sign in with your GitHub account, select the server size, select the `OpenScope` option from the `Image` dropdown menu, and navigate to the `openscope_databook/docs` directory which contains the OpenScope notebooks.
 
 ### Locally
 You can download an individual notebook by pressing the `Download` button in the top-right and selecting `.ipynb`. Alternatively, you can clone the repo to your machine and access the files there. The repo can be found by hovering over the the `GitHub` button in the top-right and selecting `repository`. When run locally, the environment can be replicated with our [requirements.txt](https://github.com/AllenInstitute/openscope_databook/blob/main/requirements.txt) file using the command 
@@ -141,7 +138,7 @@ Reproducible Analysis requires four components;
 The Databook leverages a number of technologies to combine those components into a web-application. 
 
 ### Data
-Data is accessed from The [DANDI archive](https://dandiarchive.org/) and downloaded via the [DANDI Python API](https://dandi.readthedocs.io/en/latest/modref/index.html) within notebooks. Most notebooks make use of publicly available datasets on DANDI, but for some notebooks, there is not yet sufficient publicly-available data to demonstrate our analysis. For these, it is encouraged to use your own NWB Files that are privately stored on DANDI.
+Data is accessed from the [DANDI Archive](https://dandiarchive.org/) and downloaded via the [DANDI Python API](https://dandi.readthedocs.io/en/latest/modref/index.html) within notebooks. Most notebooks make use of publicly available datasets on DANDI, but for some notebooks, there is not yet sufficient publicly-available data to demonstrate our analysis. For these, it is encouraged to use your own NWB Files that are privately stored on DANDI.
 
 ### Computation
 This project utilizes [Binder](https://mybinder.org/), as the host for the environment and the provider of computational resources. Conveniently, Binder has support for effectively replicating a computational environment from a GitHub Repo. Users of the Databook don't have to worry about managing the environment if they prefer to use our integrated Binder functionality. However, the Databook can be run locally or on other hosts. Details about the different ways to run this code can be found in the section [How Can I Use It?](Usage) below.
diff --git a/docs/methods/environments.md b/docs/methods/environments.md
@@ -1,6 +1,6 @@
 # Managing Environments on Multiple Platforms
 
-As briefly mentioned in [Jupyter/Jupyter Book](./jupyter_book.md), Jupyter Book allows users to launch notebooks in the Databook onto other computing platforms. Namely, [Dandihub](hub.dandiarchive.org), [Binder](mybinder.org), and [Google Colab](colab.research.google.com). In the case of Binder, clicking the launch button automatically produces a docker container with the necessary environment installed by using the files [apt.txt](https://github.com/AllenInstitute/openscope_databook/blob/main/apt.txt), [setup.py](https://github.com/AllenInstitute/openscope_databook/blob/main/setup.py), and [postBuild](https://github.com/AllenInstitute/openscope_databook/blob/main/postBuild) from the root directory. In the case of Dandihub, once a user has made an account, the environment should be persistent between runtimes, but it must still be setup once. As for Google Colab, the environment must be setup every runtime. If a notebook is being run locally, the environment may or may not be installed depending on the user.
+As briefly mentioned in [Jupyter/Jupyter Book](./jupyter_book.md), Jupyter Book allows users to launch notebooks in the Databook onto other computing platforms. Namely, [Dandihub](hub.dandiarchive.org), [Binder](mybinder.org), and [Google Colab](colab.research.google.com). In the case of Binder, clicking the launch button automatically produces a docker container with the necessary environment installed by using the files [apt.txt](https://github.com/AllenInstitute/openscope_databook/blob/main/apt.txt), [setup.py](https://github.com/AllenInstitute/openscope_databook/blob/main/setup.py), and [postBuild](https://github.com/AllenInstitute/openscope_databook/blob/main/postBuild) from the root directory. In the case of Dandihub, a user can select the `OpenScope` image from the dropdown menu. As for Google Colab, the environment must be setup every runtime. If a notebook is being run locally, the environment may or may not be installed depending on the user.
 
 The measure we employed to solve this is to have an *"Environment Setup"* cell at the top of every notebook. This cell checks to see if the `databook_utils` package exists. databook_utils consists of merely a few small Python modules which handle DANDI operations that are used in nearly every Notebook in the databook. It can be presumed that the presence of databook_utils is a reliable indicator of whether or not the proper environment of the Databook is installed in the local environment of a notebook. Therefore, if importing databook_utils succeeds, the notebook continues. If it fails, the databook is cloned from GitHub and the necessary environment is installed from within the Databook's root directory using the command `pip install -e .`. This installs all the necessary pip dependencies as well as the databook_utils package which is stored within the Databook at [/databook_utils](https://github.com/AllenInstitute/openscope_databook/tree/main/databook_utils). However, the Jupyter kernel does not update its environment with the local environment until it has been restarted. In the case that the environment was installed by the Environment Setup, the Jupyter kernel must be restarted by the user and run again.