This document describes the high-level architecture of the OpenDataHub Notebooks repository. For AI agent-specific instructions, see AGENTS.md. For contributing guidelines, see CONTRIBUTING.md.
The repository builds container images for interactive data science workbenches: Jupyter notebooks, RStudio, and Code-Server (VS Code in the browser). These images run on OpenShift as part of OpenDataHub (ODH) and Red Hat OpenShift AI (RHOAI).
Images form a conceptual hierarchy. Each image is built by a standalone multi-stage
Dockerfile that pulls its parent as a FROM base image (not as a build-time dependency
in this repo):
Base image (external or base-images/)
└── jupyter/minimal ← Python, JupyterLab, basic packages
└── jupyter/datascience ← NumPy, Pandas, SciPy, scikit-learn
├── jupyter/pytorch ← PyTorch + CUDA/ROCm
├── jupyter/pytorch+llmcompressor ← PyTorch + LLM Compressor
├── jupyter/tensorflow ← TensorFlow + CUDA
├── jupyter/trustyai ← TrustyAI explainability
├── jupyter/rocm/pytorch ← PyTorch + ROCm
└── jupyter/rocm/tensorflow ← TensorFlow + ROCm
In ODH (OpenDataHub), base images are built from base-images/ in this repo.
In RHOAI (Red Hat OpenShift AI), base images come from the AIPCC pipeline instead.
Each image directory (e.g. jupyter/minimal/ubi9-python-3.12/) contains:
Dockerfile.*— one per variant (cpu, cuda, rocm, konflux.cpu, etc.)pyproject.toml— Python dependenciesuv.lock.d/pylock.*.toml— locked dependency files per variantbuild-args/— build argument configuration per variant
The runtimes/ directory mirrors the same flavor structure (minimal, datascience, pytorch, tensorflow, etc.) for Elyra pipeline execution images.
| Directory | Purpose |
|---|---|
jupyter/ |
Jupyter notebook image definitions, organized by flavor and accelerator |
runtimes/ |
Pipeline runtime images used by Elyra to execute notebook pipeline nodes |
codeserver/ |
Code-Server (VS Code in the browser) image definitions |
rstudio/ |
RStudio Server image definitions |
ci/ |
CI utility scripts — Makefile helpers, PR change detection, validation, cached build logic |
scripts/ |
Maintenance scripts — lockfile generation, CVE tracking, image analysis |
ntb/ |
Shared Python library — string utilities, assertions, constants used across CI and tests |
tests/ |
Test suite — unit tests, container integration tests (testcontainers), browser tests (Playwright) |
manifests/ |
Kubernetes ImageStream manifests for ODH (manifests/odh/) and RHOAI (manifests/rhoai/) |
base-images/ |
CUDA and ROCm GPU-accelerated base image definitions |
dependencies/ |
Shared dependency constraints (CVE pinning) and meta packages for common dependency groups |
examples/ |
Example JupyterLab notebooks for validating workbench functionality |
docs/adr/ |
Architecture Decision Records |
The Makefile orchestrates image builds. Each image has a make target:
make jupyter-minimal-ubi9-python-3.12 # build one image
make all-images # build everything
make test # run quick static tests (pytest + lint)The build system supports two modes:
- ODH mode (default):
KONFLUX=no, uses standard Dockerfiles - RHOAI/Konflux mode:
KONFLUX=yes, usesDockerfile.konflux.*variants with prefetched dependencies
| Layer | Location | What it tests | How to run |
|---|---|---|---|
| Unit tests | tests/, ntb/ |
CI scripts, utilities, doctests | make test |
| Container tests | tests/containers/ |
Image startup, package imports, CLI tools | pytest tests/containers --image=<img> |
| GPU tests | tests/containers/workbenches/ |
CUDA/ROCm library loading, GPU operations | Requires GPU hardware or fake GPU setup |
| Browser tests | tests/browser/ |
JupyterLab, Code-Server UI via Playwright | cd tests/browser && pnpm playwright test |
| OpenShift tests | tests/containers/ (marked @openshift) |
Full pod lifecycle on a real cluster | Requires OpenShift cluster |
The images built by this repo are also tested by other projects in the OpenDataHub ecosystem:
| Suite | Framework | What it tests |
|---|---|---|
| odh-dashboard | Cypress (TypeScript) | Workbench creation/deletion, image selection, status transitions, storage, and RBAC via the ODH dashboard UI |
| ods-ci | Robot Framework | Image spawning, GPU/CUDA validation, JupyterLab plugin consistency, Elyra pipelines, long-running stability, and specialized toolkit integration (OpenVINO, Intel AIKIT) |
| opendatahub-tests | Pytest (Python) | Kubernetes ImageStream health, Notebook CR spawning, Python package availability inside images, and container resource constraints |
The workbench images are not standalone — they integrate tightly with several ODH platform components.
The ODH Operator deploys workbench ImageStreams to the cluster using a kustomize pipeline:
manifests/*/base/params-latest.env (image digests, nudge-updated)
manifests/*/base/params.env (released version refs)
↓
kustomize configMapGenerator → ConfigMap "notebook-image-params"
↓
kustomize replacements (80+ entries) → *_PLACEHOLDER values in ImageStreams
↓
operator deploys to cluster → OpenShift imports images from registry
The operator maps RELATED_IMAGE_* environment variables to params.env keys
(see issue #2982 for simplification plans).
Each ImageStream carries two tags: the current version (N) and the previous release (N-1).
The ODH Dashboard discovers workbench images via
ImageStream annotations in manifests/*/base/. Key annotations include opendatahub.io/notebook-image-name,
opendatahub.io/notebook-image-order, opendatahub.io/recommended-accelerators, and
opendatahub.io/notebook-python-dependencies. When launching a workbench, the dashboard injects
the NOTEBOOK_ARGS environment variable with OAuth proxy and configuration settings.
The ODH Notebook Controller runs a mutating webhook that transforms Notebook CR pods at creation time:
- Resolves container image from ImageStream annotations
- Mounts CA certificate bundles at
/etc/pki/tls/custom-certs/ca-bundle.crt - Mounts pipeline runtime images ConfigMap at
/opt/app-root/pipeline-runtimes/(for Elyra) - Mounts DSPA connection secret at
/opt/app-root/runtimes/(for Elyra pipeline execution) - Injects kube-rbac-proxy sidecar for OAuth
The notebook controller's culler expects a Jupyter-compatible API at /api/kernels/ that reports
last_activity timestamps and execution_state (busy/idle). JupyterLab provides this natively.
Code-Server and RStudio do not have a Jupyter-compatible API, so this repo fakes it using a three-process stack per workbench container:
- nginx (port 80) — reverse proxy with custom JSON access logging for activity tracking
- httpd (port 8080) — Apache acting as a CGI gateway
- bash CGI scripts —
access.cgiimplements the/api/kernels/endpoint by either polling the IDE's heartbeat (Code-Server) or parsing nginx access logs (RStudio)
Key files:
codeserver/*/nginx/api/kernels/access.cgi— pollslocalhost:8888/codeserver/healthz, converts heartbeat to Jupyter kernel formatrstudio/*/nginx/api/kernels/access.cgi— parses nginx access logs, marks idle after 10 minutes of inactivitycodeserver/*/nginx/httpconf/http.conf— custom nginx log format producing JSON withlast_activityin ISO 8601 format
This architecture is fragile and is planned for replacement with a single Go reverse proxy that handles both traffic forwarding and activity tracking in one process.
Elyra (ODH fork) enables visual pipeline editing in JupyterLab. The integration chain:
- This repo builds runtime images (
runtimes/) and publishes ImageStreams withopendatahub.io/runtime-image: "true"label andopendatahub.io/runtime-image-metadataannotation containing Elyra runtime configuration - Notebook controller discovers runtime ImageStreams and creates a ConfigMap,
mounted at
/opt/app-root/pipeline-runtimes/inside workbench pods (notebook_runtime.go) - Notebook controller also creates a DSPA connection secret mounted at
/opt/app-root/runtimes/(notebook_dspa_secret.go) setup-elyra.sh(sourced at workbench startup) copies the mounted JSON configs into Elyra's metadata directories so pipelines can discover available runtime images and the Data Science Pipelines endpoint
Several security scanning config files are synced automatically from the central
opendatahub-io/security-config repository
by the security-config-sync[bot]:
| File | Purpose |
|---|---|
.coderabbit.yaml |
CodeRabbit review configuration (inherits org-wide settings) |
semgrep.yaml |
Semgrep static analysis rules (secrets detection, language-specific checks) |
.gitleaks.toml |
Gitleaks secret scanning configuration |
.gitleaksignore |
Gitleaks false-positive suppressions |
These files are protected by an org-level push ruleset — they cannot be modified directly
in this repo. Changes must go upstream to security-config. The yamllint config
(ci/yamllint-config.yaml) suppresses the document-start rule for these files since their
format is controlled externally.
- Python — CI scripts, tests, image dependency management
- Go —
scripts/buildinputs/tool that parses Dockerfiles to extract COPY/ADD dependencies - TypeScript — Browser tests (Playwright), Code-Server test models
- Bash — Build scripts, CI checks
- Makefile — Build orchestration