Skip to content

Add AIPCC-based Containerfile for Spark operator#74

Merged
shruthis4 merged 1 commit intoopendatahub-io:mainfrom
Vedant-Deshpande:RHOAIENG-54585-aipcc-containerfile
Apr 3, 2026
Merged

Add AIPCC-based Containerfile for Spark operator#74
shruthis4 merged 1 commit intoopendatahub-io:mainfrom
Vedant-Deshpande:RHOAIENG-54585-aipcc-containerfile

Conversation

@Vedant-Deshpande
Copy link
Copy Markdown

@Vedant-Deshpande Vedant-Deshpande commented Mar 24, 2026

Purpose of this PR

Currently in RHOAI builds, the base image for the Spark operator is the upstream Spark container. This PR introduces a new multi-stage Containerfile built on a Red Hat AIPCC base image (quay.io/aipcc/base-images/cpu:3.4.0-1773957977) instead, containing the necessary Java libraries (OpenJDK 17), a PySpark 4.0.1 installation (providing spark-submit and all Spark jars), and the compiled operator binary.
Proposed changes:

  • Add new examples/openshift/Containerfile using AIPCC base image with a multi-stage build (Go builder + runtime)
  • Install Java 17 (OpenJDK headless), PySpark 4.0.1, tini, and OpenShift-required system packages (nss_wrapper, krb5-workstation, etc.)
  • Add UBI9 repo configuration since the AIPCC base image ships with no dnf repos
  • Implement a wrapper entrypoint that routes driver/executor commands to Spark's K8s entrypoint and all other commands to the operator's entrypoint, enabling the same image to serve as both the operator and the Spark runtime
  • Ensure OpenShift arbitrary UID compatibility (GID 0 permissions, nss_wrapper support)

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

The upstream Spark operator image uses docker.io/library/spark:4.0.1 as its base, which is not suitable for RHOAI builds. A Red Hat-based image is needed for supportability, FIPS compliance, and compatibility with the AIPCC ecosystem.

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

Manually tested by triggering workflow: https://github.com/opendatahub-io/spark-operator/actions/runs/23923611584

Summary by CodeRabbit

  • New Features
    • Added OpenShift-compatible container image build for the Spark Operator.
    • Supports configurable base images and targets Java 17 runtime.
    • Allows selecting PySpark version at build time.
    • Ensures files, directories and entrypoint run correctly with arbitrary UIDs/GIDs.
    • Prepares runtime layout, startup entrypoint, writable work and log directories, and webhook certificate paths.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new OpenShift-targeted multi-stage Dockerfile at examples/openshift/Dockerfile.odh. Stage 1 builds a static Go spark-operator binary with cached Go module/build dirs and TARGETARCH; final runtime stage starts from configurable BASE_IMAGE, accepts SPARK_UID/SPARK_GID and PYSPARK_VERSION, creates or renames the spark user/group to match SPARK_UID, installs Java 17 and tini, installs PySpark via pip3 install pyspark==${PYSPARK_VERSION}, configures runtime directories and permissions (including OpenShift arbitrary UID/GID compatibility), copies the operator binary and entrypoint.sh, switches to USER ${SPARK_UID}:${SPARK_GID}, and sets ENTRYPOINT /usr/bin/entrypoint.sh.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Security and Code Quality Issues

CWE-20 / CWE-426 — Unvalidated external input / supply-chain injection

  • Issue: PySpark is installed via pip3 install pyspark==${PYSPARK_VERSION} without package integrity checks or pinned hashes.
  • Action: Require pip hashes (--require-hashes) or install pinned, vetted wheels; validate/sanitize PYSPARK_VERSION and consider an allowlist for permitted versions.

CWE-276 / CWE-732 — Incorrect default permissions / Exposure of sensitive files

  • Issue: Broad chmod -R g=u applied to directories (including cert and config dirs) can grant excessive group write permissions.
  • Action: Apply least-privilege permissions only to specific runtime-writable dirs (e.g., /opt/spark/work-dir, /opt/spark/logs); keep certificate and sensitive dirs restrictive (700/750).

CWE-269 — Improper Access Control for Arbitrary UID containers

  • Issue: Using group-owner adjustments and making group perms mirror user perms to satisfy OpenShift arbitrary UIDs may expose files to unintended groups or leave ownership ambiguous.
  • Action: Set explicit ownership and minimal necessary group permissions for runtime-writable paths; avoid broad chgrp -R 0 and prefer targeted chown/chgrp operations at build time.

CWE-1104 / Supply-chain drift risk

  • Issue: dnf install and pip install steps lack pinned package versions or checksums (e.g., tini, system packages, PySpark).
  • Action: Pin package versions, use signed repositories, vendor or verify artifacts with checksums, and prefer reproducible package sources.

CWE-444 / Build determinism and platform mismatch

  • Issue: TARGETARCH is used for cross-arch build, but FROM lines and stages lack explicit --platform semantics making multi-arch reproducibility unclear.
  • Action: Use explicit --platform in FROM or enforce buildkit multi-platform builds; document and validate produced binary's target arch.

Actionable Dockerfile checks

  • Verify the Go builder produces a statically-linked binary (CGO disabled and correct linker flags).
  • Confirm Java 17 is properly installed and validated (java -version), with failure handling.
  • Ensure user switch and file ownership/permission changes occur in the correct order so non-root runtime user has required access without granting excessive privileges.
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning Title mentions 'AIPCC-based Containerfile' but the actual file added is named 'Dockerfile.odh', and the PR objectives describe it as an 'OpenShift-focused' image, not AIPCC-based. Revise title to accurately reflect the file being added and its primary purpose, such as 'Add OpenShift-focused Dockerfile for Spark operator' or 'Add Dockerfile.odh for Spark operator runtime'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
examples/openshift/Dockerfile.odh (1)

108-108: Redundant permission setting.

chmod g+w /opt/spark/work-dir here is overridden by chmod -R 775 /opt/spark/work-dir at line 131.

Remove redundant line
     ln -s ${PYSPARK_DIR} /opt/spark; \
-    mkdir -p /opt/spark/work-dir /opt/spark/logs; \
-    chmod g+w /opt/spark/work-dir; \
+    mkdir -p /opt/spark/work-dir /opt/spark/logs; \
     touch /opt/spark/RELEASE; \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` at line 108, The Dockerfile contains a
redundant permission change: remove the standalone "chmod g+w
/opt/spark/work-dir" command since a later "chmod -R 775 /opt/spark/work-dir"
already sets the intended permissions recursively; simply delete the earlier
chmod g+w invocation so only the recursive chmod -R 775 on /opt/spark/work-dir
remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/openshift/Dockerfile.odh`:
- Around line 95-96: The Dockerfile currently downloads tini using the
TINI_VERSION variable and writes it to /usr/bin/tini without integrity checks;
modify the RUN step to also fetch or embed the known SHA256 for the specific
TINI_VERSION, verify the downloaded file's checksum (e.g., with sha256sum or by
comparing to the .sha256 from the release) before chmod +x and only install if
the checksum matches, failing the build on mismatch; reference the TINI_VERSION
variable and the /usr/bin/tini target when updating the RUN command so the
download, checksum verification, and installation are performed atomically.
- Around line 117-119: The Dockerfile currently downloads the Spark entrypoint
script via RUN curl ... -o /opt/entrypoint.sh (using PYSPARK_VERSION) without
integrity verification; either vendor the exact spark-entrypoint.sh (e.g.,
spark-entrypoint.sh from Spark 4.0.1) into the repository and update the
Dockerfile to COPY that file into /opt/entrypoint.sh and set executable
permissions, or if you must download at build time, add a known-good SHA256
(stored in the repo or as a build ARG) and validate the downloaded
/opt/entrypoint.sh against that checksum before setting permissions (failing the
build if the hash does not match) so the RUN curl ... command is followed by a
checksum verification step referencing /opt/entrypoint.sh and the stored
expected hash.
- Around line 69-83: The RUN heredoc that writes /etc/yum.repos.d/ubi9.repo (the
block starting with "RUN cat > /etc/yum.repos.d/ubi9.repo << 'EOF'") triggers
Hadolint DL1000 because heredocs in RUN require BuildKit; fix by switching to a
BuildKit-safe form: add the BuildKit syntax directive "#
syntax=docker/dockerfile:1" at the top of the Dockerfile if missing, then
replace the RUN heredoc with a COPY heredoc (or use the BuildKit-native heredoc
form) that embeds the repo content and escape the variable $basearch as
\$basearch so the literal $basearch is preserved in the Dockerfile; ensure the
target path remains /etc/yum.repos.d/ubi9.repo and the gpgkey/gpgcheck lines are
retained.

---

Nitpick comments:
In `@examples/openshift/Dockerfile.odh`:
- Line 108: The Dockerfile contains a redundant permission change: remove the
standalone "chmod g+w /opt/spark/work-dir" command since a later "chmod -R 775
/opt/spark/work-dir" already sets the intended permissions recursively; simply
delete the earlier chmod g+w invocation so only the recursive chmod -R 775 on
/opt/spark/work-dir remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 1b8f1a98-b12f-4376-9eb9-cbe30a87bc82

📥 Commits

Reviewing files that changed from the base of the PR and between b9b6a25 and 30c96f3.

📒 Files selected for processing (1)
  • examples/openshift/Dockerfile.odh

Comment on lines +95 to +96
RUN curl -fsSL https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /usr/bin/tini && \
chmod +x /usr/bin/tini
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CWE-494: tini downloaded without integrity verification.

The tini binary is fetched from GitHub without checksum validation. A compromised release or MITM attack could inject a malicious init process.

Proposed fix with SHA256 verification
-RUN curl -fsSL https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /usr/bin/tini && \
-    chmod +x /usr/bin/tini
+ARG TINI_SHA256=12d20136605531b09a2c2dac02ccee85e1b874eb322ef6baf7561cd93f93c855
+RUN curl -fsSL https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /usr/bin/tini && \
+    echo "${TINI_SHA256}  /usr/bin/tini" | sha256sum -c - && \
+    chmod +x /usr/bin/tini

Get the correct SHA256 from tini's GitHub releases page for v0.19.0.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN curl -fsSL https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /usr/bin/tini && \
chmod +x /usr/bin/tini
ARG TINI_SHA256=12d20136605531b09a2c2dac02ccee85e1b874eb322ef6baf7561cd93f93c855
RUN curl -fsSL https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /usr/bin/tini && \
echo "${TINI_SHA256} /usr/bin/tini" | sha256sum -c - && \
chmod +x /usr/bin/tini
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` around lines 95 - 96, The Dockerfile
currently downloads tini using the TINI_VERSION variable and writes it to
/usr/bin/tini without integrity checks; modify the RUN step to also fetch or
embed the known SHA256 for the specific TINI_VERSION, verify the downloaded
file's checksum (e.g., with sha256sum or by comparing to the .sha256 from the
release) before chmod +x and only install if the checksum matches, failing the
build on mismatch; reference the TINI_VERSION variable and the /usr/bin/tini
target when updating the RUN command so the download, checksum verification, and
installation are performed atomically.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch by coderabbit, Dockerfile uses ARG TARGETARCH for multi-architecture support but downloads the generic tini binary which defaults to amd64. This will fail on ARM64 systems, and issue like:

  1. No checksum verification for tini binary download
  2. No checksum verification for Spark entrypoint.sh download
    this makes it vulnerable to MITM attack

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobuRishabh, excellent points — there are actually three compounding problems here:


Issue 1 — Wrong tini binary for non-amd64 architectures (CWE-20)

The current code downloads the generic tini binary, which is the amd64 build. tini v0.19.0 ships separate architecture-specific binaries: tini-amd64, tini-arm64, etc. A build with --platform=linux/arm64 will silently install the wrong binary, causing runtime failures.

Issue 2 — No integrity check on tini (CWE-494)

The generic tini binary is downloaded without checksum verification. The official tini documentation recommends downloading both the binary and its .sha256sum companion file and verifying before installation.

Issue 3 — No integrity check on entrypoint.sh (CWE-494)

Per the AI summary, the Dockerfile also downloads Spark's Kubernetes entrypoint.sh from a remote URL without any hash verification — same MITM risk.


Proposed fix — all three issues combined

ARG TARGETARCH

# Map Docker TARGETARCH (amd64/arm64) to tini binary name
RUN TINI_ARCH="${TARGETARCH}" && \
    curl -fsSL \
        "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}" \
        -o /usr/bin/tini && \
    curl -fsSL \
        "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}.sha256sum" \
        -o /tmp/tini.sha256sum && \
    # Adjust the filename in the checksum file to match the install target
    sed -i "s|tini-${TINI_ARCH}|/usr/bin/tini|" /tmp/tini.sha256sum && \
    sha256sum -c /tmp/tini.sha256sum && \
    rm -f /tmp/tini.sha256sum && \
    chmod +x /usr/bin/tini

For entrypoint.sh, pin the download to a specific git commit SHA and verify its hash:

ARG SPARK_ENTRYPOINT_SHA256=<known-sha256-for-the-pinned-version>
RUN curl -fsSL \
        "https://raw.githubusercontent.com/apache/spark/<commit-sha>/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh" \
        -o /opt/entrypoint.sh && \
    echo "${SPARK_ENTRYPOINT_SHA256}  /opt/entrypoint.sh" | sha256sum -c - && \
    chmod +x /opt/entrypoint.sh

Note: Replace <known-sha256-for-the-pinned-version> and <commit-sha> with the values from the specific Spark 4.0.1 release tag. Using a raw GitHub URL pinned to a release tag (e.g., refs/tags/v4.0.1) rather than a branch/main also reduces drift risk.

Comment on lines +95 to +96
RUN curl -fsSL https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini -o /usr/bin/tini && \
chmod +x /usr/bin/tini
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch by coderabbit, Dockerfile uses ARG TARGETARCH for multi-architecture support but downloads the generic tini binary which defaults to amd64. This will fail on ARM64 systems, and issue like:

  1. No checksum verification for tini binary download
  2. No checksum verification for Spark entrypoint.sh download
    this makes it vulnerable to MITM attack

chown -R spark:spark /opt/spark/work-dir /opt/spark/RELEASE

# Download Spark's Kubernetes entrypoint for driver/executor pods
RUN curl -fsSL https://raw.githubusercontent.com/apache/spark/v${PYSPARK_VERSION}/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment "No checksum verification for Spark entrypoint.sh download" prone to MITM attack. TIni runs as the root init process (PID1) for every container, so its very important no one gets complete control of our container.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Spark entrypoint, consider either:

  1. Adding checksum verification
  2. Bundling the script in the repository instead of downloading from GitHub

@@ -0,0 +1,147 @@
# Containerfile for Spark Operator based on AIPCC base image
#
# Build: docker build -f examples/openshift/Containerfile -t <image> .
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of Containerfile it should be Dockerfile.odh

Build: docker build -f examples/openshift/Dockerfile.odh -t <image> .


USER ${SPARK_UID}

ENTRYPOINT ["/usr/bin/wrapper-entrypoint.sh"] No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a newline after this last line. POSIX compliance requires files to end with a newline character.

# Build Arguments
################################################################################
ARG GO_BUILDER_IMAGE=registry.access.redhat.com/ubi9/go-toolset:1.25.7
ARG BASE_IMAGE=quay.io/aipcc/base-images/cpu:3.4.0-1773957977
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this tag guaranteed to be immutable? If it's a rolling tag, builds could become non-reproducible. I am not sure abt this.

Comment on lines +110 to +112
if [ -f ${PYSPARK_DIR}/kubernetes/dockerfiles/spark/decom.sh ]; then \
cp ${PYSPARK_DIR}/kubernetes/dockerfiles/spark/decom.sh /opt/decom.sh; \
chmod a+x /opt/decom.sh; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add error handling here for Silent failure if the file doesn't exist.
Is decom.sh required for Spark operator functionality? If yes, the build should fail if it's missing. If no, add a comment explaining why it's optional.

Comment on lines +141 to +143
RUN printf '#!/bin/bash\ncase "${1:-}" in\n driver|driver-py|driver-r|executor)\n exec /opt/entrypoint.sh "$@"\n ;;\n *)\n exec /usr/bin/entrypoint.sh "$@"\n ;;\nesac\n' \
> /usr/bin/wrapper-entrypoint.sh && \
chmod +x /usr/bin/wrapper-entrypoint.sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add shebang validation here, and error handling like set -e. We can also create wrapper-entrypoint.sh as a separate file in the repository

Comment on lines +86 to +92
RUN dnf --setopt=install_weak_deps=0 --setopt=tsflags=nodocs install -y \
java-17-openjdk-headless \
procps-ng net-tools krb5-workstation nss_wrapper \
&& dnf clean all && rm -rf /var/cache/dnf/*

ENV JAVA_HOME=/usr/lib/jvm/jre-17-openjdk
ENV PATH="${JAVA_HOME}/bin:${PATH}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add java installation verification here

  RUN java -version 2>&1 | grep "17\."

################################################################################
FROM ${BASE_IMAGE}

LABEL name="spark-operator" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we going to add or have any kind of label version also here @alimaredia ?

@alimaredia
Copy link
Copy Markdown

@RobuRishabh yesterday, @Vedant-Deshpande and I talked and the following changes will be made:

  • use of a Red Hat Java image as the base image
  • Removal of the Spark installation entirely
  • tini will be installed via dnf install from epel.

I'm not going to give another round of review until he's got a new version of the Dockerfile with those changes

@Vedant-Deshpande Vedant-Deshpande force-pushed the RHOAIENG-54585-aipcc-containerfile branch 2 times, most recently from db07eee to 8fef643 Compare March 25, 2026 16:32
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/openshift/Dockerfile.odh`:
- Around line 51-72: The account-creation block hardcodes UID/GID 185 instead of
using the build arg SPARK_UID, causing mismatched ownership when SPARK_UID is
changed; update the conditional checks and commands (getent group/passwd,
groupmod, groupadd, usermod, useradd, chown) to reference the variable SPARK_UID
(and use it for both UID and GID where appropriate) so the group/user
creation/renaming and final chown use ${SPARK_UID} consistently and match the
runtime user selection.
- Around line 74-79: The Dockerfile currently copies system crypto and Java
config from the separate JDK stage via the COPY --from=jdk ... lines (notably
the COPY lines for /etc/crypto-policies, /etc/java, /etc/pki/java and related
/usr/lib/jvm), which risks crypto/TLS/FIPS divergence; remove the COPY
--from=jdk of /etc/crypto-policies (and avoid pulling system-wide /etc/* trees
from a different image lineage) and instead either install Java directly into
the runtime base image during build or switch the base to the supported Red Hat
Java runtime image so Java and system crypto policies come from the same image
lineage; update or delete the COPY --from=jdk /etc/crypto-policies (and related
/etc/* system copies) and adjust subsequent Dockerfile steps that assume those
files exist.
- Around line 97-115: The Dockerfile creates a symlink from the installed
pyspark package into /opt/spark (via PYSPARK_DIR -> /opt/spark) which causes
later recursive chmod/chgrp (chgrp -R 0 /opt/spark; chmod -R g=u /opt/spark) to
modify the package; instead copy the Spark distribution into /opt/spark rather
than symlinking (replace the ln -s ${PYSPARK_DIR} /opt/spark step with a safe
copy of only the distribution files), and limit recursive permission changes to
the state directories only (apply chown/chmod -R to /opt/spark/work-dir and
/opt/spark/logs instead of /opt/spark); adjust the related commands that set
SPARK_HOME, PYSPARK_DIR handling, chown -R spark:spark, chgrp -R and chmod -R
usages so no recursive operations touch the installed pyspark package.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: a45ab083-3889-4b1f-a30a-8caf2e96c228

📥 Commits

Reviewing files that changed from the base of the PR and between 30c96f3 and db07eee.

📒 Files selected for processing (2)
  • entrypoint.sh
  • examples/openshift/Dockerfile.odh

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
examples/openshift/Dockerfile.odh (5)

10-10: ⚠️ Potential issue | 🟠 Major

Pin JDK_IMAGE to an immutable reference.

registry.redhat.io/ubi9/openjdk-21:latest makes this stage non-reproducible. A later push to latest can silently change the Java payload copied into the runtime image without any Dockerfile change. Pin a concrete release and, ideally, its digest.

Suggested fix
-ARG JDK_IMAGE=registry.redhat.io/ubi9/openjdk-21:latest
+ARG JDK_IMAGE=registry.redhat.io/ubi9/openjdk-21:<release>@sha256:<digest>

As per coding guidelines, **/Dockerfile*: Use specific image tags (not :latest).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` at line 10, The build ARG JDK_IMAGE is
currently set to an unstable mutable tag
("registry.redhat.io/ubi9/openjdk-21:latest"); update the ARG JDK_IMAGE
declaration to reference a concrete, immutable image (use a specific release tag
or, preferably, the image digest form) so the runtime image is
reproducible—locate the ARG JDK_IMAGE in the Dockerfile.odh and replace the
:latest value with a stable version or `@sha256`:<digest>.

43-43: ⚠️ Potential issue | 🟠 Major

CWE-16: don't overlay Java security config from a different base image.

Line 43 builds the runtime from ${BASE_IMAGE}, then Lines 75-79 copy /etc/java, /etc/pki/java, and /etc/crypto-policies out of ${JDK_IMAGE}. That splices together two independent patch streams. After a base refresh, the final image can carry stale Java CAs or crypto policy relative to the OS. Make the Red Hat Java image the actual runtime base, or install Java from the runtime image's own repos instead of copying /etc/* trees.

As per coding guidelines, **: REVIEW PRIORITIES: 1. Security vulnerabilities 2. Architectural issues and anti-patterns.

Also applies to: 74-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` at line 43, The Dockerfile currently uses
FROM ${BASE_IMAGE} then later copies Java config trees from ${JDK_IMAGE}
(/etc/java, /etc/pki/java, /etc/crypto-policies), which mixes two different base
images and can leave stale CAs/crypto policies; fix by making the Red Hat Java
image the actual runtime base (use ${JDK_IMAGE} in FROM) or remove the COPY of
those /etc/* trees and instead install Java and crypto policy from the runtime
image's package repos (do not splice config from ${JDK_IMAGE} into
${BASE_IMAGE}); update references to BASE_IMAGE, JDK_IMAGE, and the COPY steps
that target /etc/java, /etc/pki/java, and /etc/crypto-policies accordingly.

120-125: ⚠️ Potential issue | 🔴 Critical

Critical: this entrypoint never dispatches to Spark runtime mode.

COPY --chmod=0755 entrypoint.sh /usr/bin/ installs the repository root entrypoint.sh, and entrypoint.sh, Lines 1-26 always end by execing /usr/bin/spark-operator. This Dockerfile also never copies a Spark Kubernetes entrypoint into the image. Any pod using this image as a Spark runtime will feed Spark args to the operator binary and fail. Copy a real wrapper here that dispatches Spark runtime commands to Spark's entrypoint and everything else to spark-operator, or split the operator and runtime images.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` around lines 120 - 125, The current COPY
of entrypoint.sh (COPY --chmod=0755 entrypoint.sh /usr/bin/) installs a script
that always execs /usr/bin/spark-operator, so any Spark runtime invocation ends
up being passed to the operator binary; fix this by replacing or augmenting the
deployed entrypoint: add a wrapper entrypoint that inspects the command/args
(e.g., check first arg or presence of "spark-submit"/"spark-submit" style flags)
and, for Spark runtime invocations, dispatches to the real Spark entrypoint (the
cluster Spark binary/launch script you include in the image) while forwarding
all other invocations to exec /usr/bin/spark-operator; alternatively split into
separate operator and runtime images so the Dockerfile no longer copies the
operator-only entrypoint.sh as the image ENTRYPOINT.

98-115: ⚠️ Potential issue | 🟠 Major

CWE-732: /opt/spark becomes group-writable because the symlink is followed.

ln -s ${PYSPARK_DIR} /opt/spark makes /opt/spark point at the installed pyspark package, then Lines 113-114 recurse through that link and grant GID 0 write access to shipped code. Any process sharing the OpenShift root group can tamper with Spark jars/scripts that later execute. Copy the distribution into /opt/spark and limit recursive permission changes to work-dir and logs.

Suggested fix
 RUN PYSPARK_DIR=$(python3 -c "import pyspark; import os; print(os.path.dirname(pyspark.__file__))") && \
-    ln -s ${PYSPARK_DIR} /opt/spark && \
+    mkdir -p /opt/spark && \
+    cp -a "${PYSPARK_DIR}/." /opt/spark/ && \
     mkdir -p /opt/spark/work-dir /opt/spark/logs && \
     chmod g+w /opt/spark/work-dir && \
     touch /opt/spark/RELEASE && \
-    chown -R spark:spark /opt/spark/work-dir /opt/spark/RELEASE
+    chown -R spark:spark /opt/spark/work-dir /opt/spark/logs /opt/spark/RELEASE
 
-RUN chgrp -R 0 /opt/spark /etc/k8s-webhook-server && \
-    chmod -R g=u /opt/spark /etc/k8s-webhook-server && \
+RUN chgrp -R 0 /opt/spark/work-dir /opt/spark/logs /etc/k8s-webhook-server && \
+    chmod -R g=u /opt/spark/work-dir /opt/spark/logs /etc/k8s-webhook-server && \
     chmod -R 775 /opt/spark/work-dir /opt/spark/logs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` around lines 98 - 115, Replace the symlink
creation (ln -s ${PYSPARK_DIR} /opt/spark) with a safe copy of the pyspark
distribution into /opt/spark (e.g., cp -a ${PYSPARK_DIR} /opt/spark or rsync -a)
so you don't grant group write on the installed package; then restrict recursive
ownership/permission changes (chown -R, chgrp -R, chmod -R) to only
/opt/spark/work-dir and /opt/spark/logs and keep a non-recursive chown/chmod for
the /opt/spark top-level RELEASE file if needed (update the RUN block that
creates /opt/spark, /opt/spark/work-dir, /opt/spark/logs and the later OpenShift
permission block to operate only on work-dir/logs rather than following a
symlink into the pyspark package).

59-72: ⚠️ Potential issue | 🟠 Major

Use ${SPARK_UID} consistently during account creation.

Lines 59-69 hardcode 185, but Line 123 runs as ${SPARK_UID}. Any non-default build arg produces a runtime UID that does not own /home/spark and may not have matching passwd/group entries, which breaks the arbitrary-UID path this image is trying to support.

Suggested fix
-RUN if getent group 185 > /dev/null 2>&1; then \
-        EXISTING_GROUP=$(getent group 185 | cut -d: -f1); \
+RUN if getent group "${SPARK_UID}" > /dev/null 2>&1; then \
+        EXISTING_GROUP=$(getent group "${SPARK_UID}" | cut -d: -f1); \
         groupmod -n spark "$EXISTING_GROUP"; \
     else \
-        groupadd -g 185 spark; \
+        groupadd -g "${SPARK_UID}" spark; \
     fi && \
-    if getent passwd 185 > /dev/null 2>&1; then \
-        EXISTING_USER=$(getent passwd 185 | cut -d: -f1); \
+    if getent passwd "${SPARK_UID}" > /dev/null 2>&1; then \
+        EXISTING_USER=$(getent passwd "${SPARK_UID}" | cut -d: -f1); \
         usermod -l spark -d /home/spark "$EXISTING_USER"; \
     else \
-        useradd -u 185 -g spark -d /home/spark -s /bin/bash spark; \
+        useradd -u "${SPARK_UID}" -g spark -d /home/spark -s /bin/bash spark; \
     fi && \
     mkdir -p /home/spark && \
     chown spark:spark /home/spark

Also applies to: 123-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` around lines 59 - 72, The Dockerfile block
hardcodes UID/GID 185 which breaks arbitrary-UID builds; replace every literal
185 in the group/user checks and creation with the build arg variable
${SPARK_UID} (e.g. change getent group 185, getent passwd 185, groupadd -g 185,
useradd -u 185) and use the same variable when modifying names
(groupmod/usermod) so lookups and creations use ${SPARK_UID}; finally ensure
ownership of /home/spark is set to the numeric UID/GID (chown
"${SPARK_UID}:${SPARK_UID}" /home/spark) so a non-default runtime UID owns the
home directory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/openshift/Dockerfile.odh`:
- Line 10: The build ARG JDK_IMAGE is currently set to an unstable mutable tag
("registry.redhat.io/ubi9/openjdk-21:latest"); update the ARG JDK_IMAGE
declaration to reference a concrete, immutable image (use a specific release tag
or, preferably, the image digest form) so the runtime image is
reproducible—locate the ARG JDK_IMAGE in the Dockerfile.odh and replace the
:latest value with a stable version or `@sha256`:<digest>.
- Line 43: The Dockerfile currently uses FROM ${BASE_IMAGE} then later copies
Java config trees from ${JDK_IMAGE} (/etc/java, /etc/pki/java,
/etc/crypto-policies), which mixes two different base images and can leave stale
CAs/crypto policies; fix by making the Red Hat Java image the actual runtime
base (use ${JDK_IMAGE} in FROM) or remove the COPY of those /etc/* trees and
instead install Java and crypto policy from the runtime image's package repos
(do not splice config from ${JDK_IMAGE} into ${BASE_IMAGE}); update references
to BASE_IMAGE, JDK_IMAGE, and the COPY steps that target /etc/java,
/etc/pki/java, and /etc/crypto-policies accordingly.
- Around line 120-125: The current COPY of entrypoint.sh (COPY --chmod=0755
entrypoint.sh /usr/bin/) installs a script that always execs
/usr/bin/spark-operator, so any Spark runtime invocation ends up being passed to
the operator binary; fix this by replacing or augmenting the deployed
entrypoint: add a wrapper entrypoint that inspects the command/args (e.g., check
first arg or presence of "spark-submit"/"spark-submit" style flags) and, for
Spark runtime invocations, dispatches to the real Spark entrypoint (the cluster
Spark binary/launch script you include in the image) while forwarding all other
invocations to exec /usr/bin/spark-operator; alternatively split into separate
operator and runtime images so the Dockerfile no longer copies the operator-only
entrypoint.sh as the image ENTRYPOINT.
- Around line 98-115: Replace the symlink creation (ln -s ${PYSPARK_DIR}
/opt/spark) with a safe copy of the pyspark distribution into /opt/spark (e.g.,
cp -a ${PYSPARK_DIR} /opt/spark or rsync -a) so you don't grant group write on
the installed package; then restrict recursive ownership/permission changes
(chown -R, chgrp -R, chmod -R) to only /opt/spark/work-dir and /opt/spark/logs
and keep a non-recursive chown/chmod for the /opt/spark top-level RELEASE file
if needed (update the RUN block that creates /opt/spark, /opt/spark/work-dir,
/opt/spark/logs and the later OpenShift permission block to operate only on
work-dir/logs rather than following a symlink into the pyspark package).
- Around line 59-72: The Dockerfile block hardcodes UID/GID 185 which breaks
arbitrary-UID builds; replace every literal 185 in the group/user checks and
creation with the build arg variable ${SPARK_UID} (e.g. change getent group 185,
getent passwd 185, groupadd -g 185, useradd -u 185) and use the same variable
when modifying names (groupmod/usermod) so lookups and creations use
${SPARK_UID}; finally ensure ownership of /home/spark is set to the numeric
UID/GID (chown "${SPARK_UID}:${SPARK_UID}" /home/spark) so a non-default runtime
UID owns the home directory.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c257d66e-45c7-4206-8fd1-b0166e3facc6

📥 Commits

Reviewing files that changed from the base of the PR and between db07eee and 8fef643.

📒 Files selected for processing (1)
  • examples/openshift/Dockerfile.odh

@Vedant-Deshpande Vedant-Deshpande force-pushed the RHOAIENG-54585-aipcc-containerfile branch from 8fef643 to b5e9b69 Compare March 25, 2026 17:56
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
examples/openshift/Dockerfile.odh (1)

92-104: ⚠️ Potential issue | 🟠 Major

Spark runtime is still present; requested removal is not implemented.

Lines 92-104 still add PySpark/Spark artifacts. This conflicts with the stated requested change to remove Spark installation from this image.

Remediation (operator-only image path)
-ARG PYSPARK_VERSION=4.0.1
-ARG PYSPARK_INDEX_URL=https://packages.redhat.com/api/pypi/public-rhai/rhoai/3.4-EA2/cpu-ubi9/simple/
@@
-# Install PySpark from Red Hat's internal PyPI index
-RUN pip3 install --no-cache-dir \
-    --index-url ${PYSPARK_INDEX_URL} \
-    pyspark==${PYSPARK_VERSION}
-
-# Copy Spark distribution to /opt/spark
-ENV SPARK_HOME=/opt/spark
-RUN PYSPARK_DIR=$(python3 -c "import pyspark; import os; print(os.path.dirname(pyspark.__file__))") && \
-    cp -r "${PYSPARK_DIR}" /opt/spark && \
-    mkdir -p /opt/spark/work-dir /opt/spark/logs && \
-    touch /opt/spark/RELEASE && \
-    chown -R spark:spark /opt/spark/work-dir /opt/spark/logs /opt/spark/RELEASE
-
-ENV PATH="${PATH}:${SPARK_HOME}/bin:${SPARK_HOME}/sbin"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` around lines 92 - 104, The Dockerfile
still installs PySpark and copies the Spark distribution—remove the RUN pip3
install ... pyspark==${PYSPARK_VERSION} step, delete the block that computes
PYSPARK_DIR and copies to /opt/spark (the RUN that sets PYSPARK_DIR and cp -r
... /opt/spark and related mkdir/chown/touch), and drop the SPARK_HOME and PATH
environment lines (ENV SPARK_HOME=/opt/spark and ENV
PATH="${PATH}:${SPARK_HOME}/bin:${SPARK_HOME}/sbin") so no Spark artifacts
remain in the image; ensure any references to SPARK_HOME, PYSPARK_DIR, or
pyspark installation are removed or gated behind an operator-only build path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/openshift/Dockerfile.odh`:
- Line 10: The ARG JDK_IMAGE currently uses a mutable tag
("registry.redhat.io/ubi9/openjdk-21:latest"); replace it with an immutable
digest form (e.g., "registry.redhat.io/ubi9/openjdk-21@sha256:<digest>") by
resolving the correct sha256 digest for the desired image version from the
registry and updating the ARG JDK_IMAGE value accordingly so builds always pull
a known immutable image.

---

Duplicate comments:
In `@examples/openshift/Dockerfile.odh`:
- Around line 92-104: The Dockerfile still installs PySpark and copies the Spark
distribution—remove the RUN pip3 install ... pyspark==${PYSPARK_VERSION} step,
delete the block that computes PYSPARK_DIR and copies to /opt/spark (the RUN
that sets PYSPARK_DIR and cp -r ... /opt/spark and related mkdir/chown/touch),
and drop the SPARK_HOME and PATH environment lines (ENV SPARK_HOME=/opt/spark
and ENV PATH="${PATH}:${SPARK_HOME}/bin:${SPARK_HOME}/sbin") so no Spark
artifacts remain in the image; ensure any references to SPARK_HOME, PYSPARK_DIR,
or pyspark installation are removed or gated behind an operator-only build path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 0dbb02d5-f5f2-4ad1-8748-71d648b7aee5

📥 Commits

Reviewing files that changed from the base of the PR and between 8fef643 and b5e9b69.

📒 Files selected for processing (1)
  • examples/openshift/Dockerfile.odh

# Build Arguments
################################################################################
ARG GO_BUILDER_IMAGE=registry.access.redhat.com/ubi9/go-toolset:1.25.7
ARG JDK_IMAGE=registry.redhat.io/ubi9/openjdk-21:latest
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd "Dockerfile.odh" --type f

Repository: opendatahub-io/spark-operator

Length of output: 106


🏁 Script executed:

cat -n examples/openshift/Dockerfile.odh

Repository: opendatahub-io/spark-operator

Length of output: 5872


Pin JDK_IMAGE to an immutable digest instead of :latest.

Severity: MajorCWE-494 (download of code without integrity verification).
Exploit scenario: retagged openjdk-21:latest can silently change the runtime filesystem and toolchain, pulling unreviewed content into builds.

Remediation
-ARG JDK_IMAGE=registry.redhat.io/ubi9/openjdk-21:latest
+ARG JDK_IMAGE=registry.redhat.io/ubi9/openjdk-21@sha256:<approved_digest>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/openshift/Dockerfile.odh` at line 10, The ARG JDK_IMAGE currently
uses a mutable tag ("registry.redhat.io/ubi9/openjdk-21:latest"); replace it
with an immutable digest form (e.g.,
"registry.redhat.io/ubi9/openjdk-21@sha256:<digest>") by resolving the correct
sha256 digest for the desired image version from the registry and updating the
ARG JDK_IMAGE value accordingly so builds always pull a known immutable image.

RUN PYSPARK_DIR=$(python3 -c "import pyspark; import os; print(os.path.dirname(pyspark.__file__))") && \
cp -r "${PYSPARK_DIR}" /opt/spark && \
mkdir -p /opt/spark/work-dir /opt/spark/logs && \
touch /opt/spark/RELEASE && \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file need a value?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can put something like
echo "Spark ${PYSPARK_VERSION} (PySpark)" > /opt/spark/RELEASE
to get the installed Spark version


# Install PySpark from Red Hat's internal PyPI index
RUN pip3 install --no-cache-dir \
--index-url ${PYSPARK_INDEX_URL} \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--index-url here shouldn't be needed since the AIPCC base image only uses the Red Hat python registry.

RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf install -y tini && \
dnf clean all && rm -rf /var/cache/dnf/* && \
java -version 2>&1 | grep "21\." && \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is java -version being called here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a sanity check to check if got java from the jdk, can remove it to reduce the code

# Copy operator entrypoint
COPY --chmod=0755 entrypoint.sh /usr/bin/

USER ${SPARK_UID}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to ensure we're keeping as close to the upstream docker file as possible, we'll want this to be:

USER ${SPARK_UID}:${SPARK_GID}

Take a look at:
https://github.com/kubeflow/spark-operator/blame/master/Dockerfile#L54

COPY --from=builder /workspace/bin/spark-operator /usr/bin/spark-operator

# Copy operator entrypoint
COPY --chmod=0755 entrypoint.sh /usr/bin/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no permissions set on the COPY upstream (https://github.com/kubeflow/spark-operator/blob/master/Dockerfile#L58). Do we need the the 0755?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope it's redundant, entrypoint has this permission, will remove this

# Build Arguments
################################################################################
ARG GO_BUILDER_IMAGE=registry.access.redhat.com/ubi9/go-toolset:1.25.7
ARG JDK_IMAGE=registry.redhat.io/ubi9/openjdk-21:latest
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To stay in tune with what the upstream Spark Image is using, let's use JDK 17 instead of 21 Here's a container example: https://catalog.redhat.com/en/software/containers/ubi9/openjdk-17/61ee7c26ed74b2ffb22b07f6.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


LABEL name="spark-operator" \
summary="Spark Operator for OpenShift" \
description="Kubeflow Spark Operator built on AIPCC base with Java 21 and PySpark 4.0.1" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to change to Java 17 per the comment above

@Vedant-Deshpande Vedant-Deshpande force-pushed the RHOAIENG-54585-aipcc-containerfile branch from b5e9b69 to 273f1ad Compare March 31, 2026 19:00
@Vedant-Deshpande Vedant-Deshpande requested a review from a team as a code owner March 31, 2026 19:00
@Vedant-Deshpande Vedant-Deshpande force-pushed the RHOAIENG-54585-aipcc-containerfile branch from 273f1ad to d2c42e8 Compare April 2, 2026 16:55
@shruthis4
Copy link
Copy Markdown

@Vedant-Deshpande : Please add the manual run results to PR. Did you get this qualified against ODH integration? We could have your fork on odh-operator repo to you SHA and trigger the tests.

- Multi-stage build using AIPCC 3.4 EA-2 base image
- Builds Go operator binary from Red Hat registry toolset
- Installs Java 17 via dnf from UBI9 repos (no cross-image copy)
- Installs PySpark 4.0.1 from Red Hat's internal PyPI registry
- Installs tini from EPEL
- OpenShift arbitrary UID compatibility
- User brings their own Spark Image for driver/executor pods
@Vedant-Deshpande Vedant-Deshpande force-pushed the RHOAIENG-54585-aipcc-containerfile branch from d2c42e8 to 9609ea7 Compare April 3, 2026 15:26
@shruthis4 shruthis4 merged commit e33209c into opendatahub-io:main Apr 3, 2026
16 checks passed
shruthis4 pushed a commit to shruthis4/spark-operator that referenced this pull request Apr 3, 2026
- Multi-stage build using AIPCC 3.4 EA-2 base image
- Builds Go operator binary from Red Hat registry toolset
- Installs Java 17 via dnf from UBI9 repos (no cross-image copy)
- Installs PySpark 4.0.1 from Red Hat's internal PyPI registry
- Installs tini from EPEL
- OpenShift arbitrary UID compatibility
- User brings their own Spark Image for driver/executor pods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants