You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many ways to use Docker images. Here are common ones. Scroll to the bottom for instructions on linking your container to file systems (so you can get and store files).
4
+
5
+
## To run images in a JupyterHub with 'bring your image'
6
+
7
+
If your JupyterHub has this option:
8
+
9
+
- Click on the 'Bring your own image' radio button at bottom
10
+
- Paste in url to your image (or any other image)
11
+
- You will find the urls in the right nav bar under 'Packages'
12
+
- Example `ghcr.io/nmfs-opensci/jupyter-base-notebook:latest`
13
+
14
+
## Run with a JupyterHub
15
+
16
+
Should work out of the box. Put the url to the image whereever you would use images.
17
+
18
+
## Run with docker
19
+
20
+
You can run the images on a Virtual Machine or your computer if you have Docker or Podman installed.
21
+
22
+
```
23
+
docker run -p 8888:8888 ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest
24
+
```
25
+
26
+
On a Mac M2+ with Rosetta emulation turned on in the Docker Desktop settings.
27
+
28
+
```
29
+
docker run --platform linux/amd64 -p 8888:8888 ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest
30
+
```
31
+
32
+
In the terminal look for something like and put that in a browser.
**Running geospatial R Docker images and working with netCDF files**
39
+
40
+
GDAL netCDF driver needs some extra flags added to the `docker run` for GDAL to work correctly when run inside a Docker container. This doesn't affect Python as much since `xarray` works with netCDF via different drivers, but the `terra` netCDF functions use GDAL drivers under the hood to open netCDF files. You'll get error saying it can't find files. We ran into trouble when accessing cloud-hosted netCDF files. Perhaps it works ok if you download the files.
docker run -p 8888:8888 --cap-add SYS_PTRACE --security-opt seccomp=unconfined ghcr.io/nmfs-opensci/container-images/py-rocket-geospatial:latest
49
+
```
50
+
Note we had trouble getting this to work on an Mac with Apple chips. You can test if it is going to work by running this Python code and seeing if `you can see if `DCAP_VIRTUALIO` is listed:
51
+
```
52
+
from osgeo import gdal
53
+
nc = gdal.GetDriverByName("netCDF")
54
+
nc.GetMetadata().keys()
55
+
```
56
+
57
+
58
+
## Run with Binder
59
+
60
+
Create a file called `Dockerfile` and put in the base of your GitHub repository or in a folder called `binder` or `.binder`. Into that file put the following line (replacing the image url to match your desired image).
61
+
```
62
+
FROM ghcr.io/nmfs-opensci/container-images/py-rocket-geospatial:latest
63
+
```
64
+
65
+
Then go to <https://mybinder.org> and paste in the url to your GitHub repo or alternatively go to the following url directly:
66
+
```
67
+
https://mybinder.org/v2/gh/<username or org>/<reponame>
68
+
```
69
+
70
+
## With Codespaces
71
+
72
+
See the folders in the `.devcontainer` folder and create a `.devcontainer/devcontainer.json` file in your own repo by copying one of `devcontainer.json` file. They all use the same template with just the top lines changed. Note that the folder `.devcontainer/codespace` is also required. If you change the line that starts up Jupyter Lab (at the bottom of the devcontainer.json file, do not use port 8888 or else RStudio will not launch.
73
+
74
+
The Codespaces code is based on: <https://github.com/MichaelAkridge-NOAA/Open-Science-Codespaces>
75
+
76
+
## GitPod -- like Codespaces
77
+
78
+
Work in progress. Approach is similar to Codespaces.
79
+
80
+
## Run on Google Colab
81
+
82
+
TBD. This seems harder. See this [issue](https://github.com/nmfs-opensci/container-images/issues/14)
83
+
84
+
# Getting access to files
85
+
86
+
The container gives you a computing environment, but by design, it is a container and not connected to the file system in whatever is running the container. So you will need to get your files in/out of the container and have a way to save your work.
87
+
88
+
## Upload/Download files
89
+
90
+
Under the Files menu in Jupyter Lab or the Files tab in RStudio, you can upload and download files.
91
+
92
+
## Use a Git repository
93
+
94
+
Jupyter Lab and RStudio have Git GUIs. Use those or the command line to clone repos and push changes back to the repos.
95
+
96
+
```
97
+
cd ~
98
+
git clone <url to the repo>
99
+
```
100
+
101
+
## Connect to a bucket
102
+
103
+
If you are working with large data sets, you do not want to move these into your container (slow, slow). You will want to create a bucket (like an S3 bucket) and connect to that. This is like having a external drive in the cloud.
104
+
105
+
Instructions to come.
106
+
107
+
## Mount a file system
108
+
109
+
You can mount a local file system and read/write directly from that. Here "local" means the machine that is running the container. "local" might be a virtual machine, a server or your computer.
110
+
111
+
**On a JupyterHub**: The managers of the hub most likely have created persistent memory for you. If not, use Git, upload/download, or use buckets.
112
+
113
+
**On your computer**: you'll add a flag to the `docker run` command to mount your local file system to the Docker container.
114
+
115
+
When you use `--volume` to bind-mount a file or directory, make sure it does not exist on the Docker container. So do not bind a directory like `\usr` which would destroy the container (nothing bad; it just won't work). Use something like `\home\jovyan\mydir`. `--volume` creates the endpoint for you and it is always created as a directory.
116
+
117
+
In this example, `mydir` needs to exist in the directory where you are running `docker run`. If you get errors, try `ls` to make sure the directory is there.
118
+
```
119
+
docker run --platform linux/amd64 -p 8888:8888 --volume ./myproject_files:/home/jovyan/mydir ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest
120
+
```
121
+
as you work in `mydir` in the container, those changes will appear in your computer's `myproject_files` directory. It is as if you are working on your own computer, but you are using the development environment of the docker file.
122
+
123
+
Mac users with Apple chips, add `--platform linux/amd64`:
124
+
```
125
+
docker run --platform linux/amd64 -p 8888:8888 --volume ./myproject_files:/home/jovyan/mydir ghcr.io/nmfs-opensci/container-images/py-rocket-base:latest
Copy file name to clipboardExpand all lines: README.Rmd
+13-63Lines changed: 13 additions & 63 deletions
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,13 @@ output: github_document
4
4
<!-- DO NOT EDIT. CREATED BY README.RMD. Knit that. -->
5
5
# NMFS Open Science Docker Stack
6
6
7
-
##THE DOCKER STACK IS IN ACTIVE DEVELOPMENT
7
+
### Beta release June 1, 2024.
8
8
9
-
### Beta release targeted for June 1, 2024.
10
-
11
-
These are a collection of container images that provide standardized environments for Python and R with Jupyter Lab, RStudio and VS Code IDEs. The images are built off the [Rocker](https://rocker-project.org/images/devcontainer/images.html), [Pangeo](https://github.com/pangeo-data/pangeo-docker-images) and Jupyter base images. This repo holds the (mostly) stable docker stack for specific pipelines used in Fisheries. Why use a container? The main reason is that geospatial, bioinformatics, and TMB/INLA environments can be hard to get working right. Using a Docker image means you use a stable environment. Watch this video from Yuvi Panda (Jupyter Project) [video](https://www.youtube.com/watch?v=qgLPpULvBbQ) and read about the Rocker Project in the R Project Journal [article](https://journal.r-project.org/archive/2017/RJ-2017-065/RJ-2017-065.pdf) by Carl Boettiger and Dirk Eddelbuettel.
9
+
These are a collection of container images that provide standardized environments for Python and R with Jupyter Lab, RStudio and VS Code IDEs. The images are built off the [Rocker](https://rocker-project.org/images/devcontainer/images.html), [Pangeo](https://github.com/pangeo-data/pangeo-docker-images) and [Jupyter](https://jupyter-docker-stacks.readthedocs.io/en/latest/) base images. This repo holds the stable Docker stack for specific pipelines used in Fisheries. The images are designed to work out-of-box and identically in Jupyter Hubs, Codespaces, Binder, etc.Read the Design section below on what the NMFS Open Sci Docker Stack does. For use instructions, see [INSTRUCTIONS.md](https://nmfs-opensci/container-images/INSTRUCTIONS.md).
12
10
13
11
## Stable set of images
14
12
15
-
There are many other images in the `images` folder that are experimental in nature. There are also experimental images in the branches.
13
+
There are many other images in the `images` folder that are experimental in nature. *If you are looking for standard Python or R Docker images, go to the base Docker stacks linked above.*
16
14
17
15
```{r echo=FALSE}
18
16
source("parse_dockerfile.R")
@@ -27,7 +25,7 @@ table_line <- function(i){
27
25
branch <- system(paste0("git show-ref refs/heads/", i, " ignore.stdout = TRUE"))
28
26
binder_button <- ""
29
27
if(branch == 0) binder_button <- paste0("[](https://mybinder.org/v2/gh/nmfs-opensci/container-images/", i, ")")
30
-
cat("| [", i, "](https://github.com/nmfs-opensci/container-images/pkgs/container/container-images%2F", i, ") <br/>  | ", desc, " | [![Button GCS]](https://codespaces.new/nmfs-opensci/container-images?devcontainer_path=.devcontainer%2F", i, "%2Fdevcontainer.json) <br/> ", binder_button, " | [Dockerfile](https://github.com/nmfs-opensci/container-images/tree/main/images/", i, "/Dockerfile) <br> [directory](https://github.com/nmfs-opensci/container-images/tree/main/images/", i, ") |\n", sep="")
28
+
cat("| [", i, "](https://github.com/nmfs-opensci/container-images/pkgs/container/container-images%2F", i, ") <br/>  <br/>  | ", desc, " | [![Button GCS]](https://codespaces.new/nmfs-opensci/container-images?devcontainer_path=.devcontainer%2F", i, "%2Fdevcontainer.json) <br/> ", binder_button, " | [Dockerfile](https://github.com/nmfs-opensci/container-images/tree/main/images/", i, "/Dockerfile) <br> [directory](https://github.com/nmfs-opensci/container-images/tree/main/images/", i, ") |\n", sep="")
31
29
}
32
30
```
33
31
@@ -52,68 +50,20 @@ for(i in imgs) table_line(i)
52
50
53
51
## Design principles
54
52
55
-
- The images are designed to be deployable "out of the box" from JupyterHubs, Codespaces, GitPod, Colab, Binder, and on your computer via Docker or Podman. See instructions below. Each will spin up Jupyter Lab with JLab, RStudio and VS Code within a specific development environment.
56
-
- Python environment follows Pangeo images with micromamba installed as the solver and base and notebook environments. The Jupyter modules are installed in notebook environment and images will launch with the notebook activated, again following Pangeo design structure. Images that use Pangeo as base will have user jovyan and user home directory home/jovyan.
57
-
- R set-up follows Rocker's environment design with the exception that the user home directory is home/jovyan so it plays nice with JupyterHub deployments. The user is rstudio however.
58
-
- When an image contains both R and Python, the base image is rocker and micromamba is installed along with the Pangeo environment structure. RStudio will use the Python environment in the notebook environment when Python is used from within RStudio.
59
-
- However, they are not terribly light-weight (large). Use the original Jupyter, Pangeo or Rocker images if you are looking for simple lightweight data science images.
60
-
61
-
### Acknowledgements
62
-
63
-
The core stack is credited to the work of Luis Lopez (NASA) who developed the NASA Openscapes Python image used in countless workshops on cloud-computing with NASA Earth Data. Subsequently the NASA Openscapes mentor cloud-infrastructure Slack group and weekly co-work sessions plugged away at the problem of helping users 'fledge' off the Openscapes JupyterHub, which involved creating images that were more versitile. Carl Boettiger (UC Berkeley & Rocker Project) and Eli Holmes (NOAA Fisheries) took on different aspects of this. The GitHub Action tooling is curtesy of Carl. Yuvi Panda (Jupyter, 2i2c) was also very helpful in desiging the 'scaffolding' in the images that helps them be robust and versitile. The Codespaces and devcontainer code is based on Michael Akridge's [Open Science Codespaces](https://github.com/MichaelAkridge-NOAA/Open-Science-Codespaces) work. Individual images have different core developers: Tim Haverland (arcgis), Sunny Hospital (coastwatch), Luke Thompson (aomlomics).
64
-
65
-
## To run images in a JupyterHub with 'bring your image'
66
-
67
-
If your JupyterHub has this option:
53
+
The images are designed to be deployable "out of the box" from JupyterHubs, Codespaces, GitPod, Colab, Binder, and on your computer via Docker or Podman with no modification. See instructions below. Each will spin up Jupyter Lab with Jupyter Lab (and Notebook), RStudio and VS Code with the specific development environment.
68
54
69
-
-Click on the 'Bring your own image' radio button at bottom
70
-
-Paste in url to your image (or any other image)
71
-
-You will find the urls in the right nav bar under 'Packages'
-Python environment follows Pangeo images with micromamba installed as the solver and base and notebook environments. The Jupyter modules are installed in notebook conda environment and images will launch with the notebook environment activated, again following Pangeo design structure. Images that use Pangeo as base will have user jovyan and user home directory home/jovyan.
56
+
-Images with R ONLY follow Rocker's environment design with the exception that the user home directory is home/jovyan so it plays nice with JupyterHub deployments. The user is rstudio however.
57
+
-When an image contains both R and Python, the base image is rocker and micromamba is installed along with the Pangeo environment structure. RStudio will use the Python environment in the notebook conda environment when Python is used from within RStudio.
58
+
-These images are not terribly light-weight (they are large). Use the original Jupyter, Pangeo or Rocker images if you are looking for lightweight data science images.
73
59
74
-
## Run with a JupyterHub
75
-
76
-
Should work out of the box. Put the url to the image whereever you would use images.
77
-
78
-
## Run with docker
79
-
80
-
```
81
-
docker run -p 8888:8888 ghcr.io/nmfs-opensci/jupyter-base-notebook:latest
82
-
```
83
-
84
-
On a Mac M2+ with Rosetta emulation turned on in the Docker Desktop settings.
85
-
86
-
```
87
-
docker run --platform linux/amd64 -p 8888:8888 ghcr.io/nmfs-opensci/jupyter-base-notebook:latest
88
-
```
60
+
## Why use a container?
89
61
90
-
In the terminal look for something like and put that in a browser.
62
+
The main reason is that geospatial, bioinformatics, and TMB/INLA environments can be hard to get working right. Using a Docker image means you use a stable environment. Watch this video from Yuvi Panda (Jupyter Project) [video](https://www.youtube.com/watch?v=qgLPpULvBbQ) and read about the Rocker Project in the R Project Journal [article](https://journal.r-project.org/archive/2017/RJ-2017-065/RJ-2017-065.pdf) by Carl Boettiger and Dirk Eddelbuettel.
Should work out of the box. Copy the Dockerfile into a repo and put the Dockerfile in the base or in a folder called `binder`. Then put the url below in a browser. Note many of the Docker images are big and somewhat hairy to build. This might not work in binder.
See the folders in the `.devcontainer` folder. Note that the folder `.devcontainer/codespace` is required. Do not use port 8888 or else RStudio will not launch. See examples of how to create a button to launch a new codespace in the table above.
107
-
108
-
Based on: <https://github.com/MichaelAkridge-NOAA/Open-Science-Codespaces>
109
-
110
-
## GitPod -- like Codespaces
111
-
112
-
Still working to streamline this.
113
-
114
-
## Run on Colab
64
+
### Acknowledgements
115
65
116
-
TBD See this[issue](https://github.com/nmfs-opensci/container-images/issues/14)
66
+
The core stack is credited to the work of Luis Lopez (NASA) who developed the NASA Openscapes Python image used in countless workshops on cloud-computing with NASA Earth Data. Subsequently the NASA Openscapes mentor cloud-infrastructure Slack group and weekly co-work sessions plugged away at the problem of helping users 'fledge' off the Openscapes JupyterHub, which involved creating images that were more versitile. Carl Boettiger (UC Berkeley & Rocker Project) and Eli Holmes (NOAA Fisheries) took on different aspects of this. The GitHub Action tooling is curtesy of Carl. Yuvi Panda (Jupyter, 2i2c) was also very helpful in desiging the 'scaffolding' in the images that helps them be robust and versitile. The Codespaces and devcontainer code is based on Michael Akridge's [Open Science Codespaces](https://github.com/MichaelAkridge-NOAA/Open-Science-Codespaces) work. Individual images have different core developers: Tim Haverland (arcgis), Sunny Hospital (coastwatch), Luke Thompson (aomlomics).
0 commit comments