Skip to content

fix(automl,autorag): resolve MinIO storage access issues#7221

Merged
openshift-merge-bot[bot] merged 7 commits intoopendatahub-io:mainfrom
chrjones-rh:minio
Apr 14, 2026
Merged

fix(automl,autorag): resolve MinIO storage access issues#7221
openshift-merge-bot[bot] merged 7 commits intoopendatahub-io:mainfrom
chrjones-rh:minio

Conversation

@chrjones-rh
Copy link
Copy Markdown
Contributor

@chrjones-rh chrjones-rh commented Apr 13, 2026

https://redhat.atlassian.net/browse/RHOAIENG-57882

Description

The S3 client layer needs to support MinIO and other S3-compatible object stores that use self-signed or cluster-issued TLS certificates. Rather than skipping TLS verification entirely (as initially implemented), this PR uses the operator-mounted CA bundles to properly validate certificates.

How it works

The RHOAI operator already mounts CA bundles into the BFF pod via --bundle-paths:

  • /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem (system CAs)
  • /var/run/secrets/kubernetes.io/serviceaccount/ca.crt (cluster CA, includes ingress operator CA)
  • /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt (service serving CA)
  • /etc/pki/tls/certs/odh-ca-bundle.crt (ODH CA bundle)
  • /etc/pki/tls/certs/odh-trusted-ca-bundle.crt (ODH trusted CA bundle)

The S3 client uses this CA pool (RootCAs) to verify endpoint certificates, so self-signed MinIO certs are validated properly without skipping verification.

Key design decisions

Aspect Behavior Rationale
TLS verification Uses operator-provided CA bundles Proper cert validation instead of InsecureSkipVerify
HTTPS Always required Prevents credentials from being sent in cleartext
Private IPs Always allowed MinIO runs in-cluster on RFC-1918 service IPs
Blocked ranges Loopback, link-local, reserved SSRF protection for dangerous ranges
Dev mode Falls back to InsecureSkipVerify when no CA bundles provided Local development convenience

Why not env vars?

The initial implementation used S3_INSECURE_SKIP_VERIFY, S3_ALLOW_HTTP, and S3_ALLOW_INTERNAL_IPS env vars. These were removed because:

  1. The BFF runs as a sidecar container managed by the RHOAI operator -- end users cannot set env vars on it
  2. The operator already provides CA bundles that properly solve the self-signed cert problem
  3. Defaulting InsecureSkipVerify=true was a security concern compared to every other BFF (which defaults to false)

Files changed

  • packages/{automl,autorag}/bff/internal/integrations/s3/client_factory.go -- replaced InsecureSkipVerify, AllowHTTP, AllowInternalIPs with RootCAs *x509.CertPool
  • packages/{automl,autorag}/bff/internal/integrations/s3/client.go -- use CA pool for TLS config, HTTPS-only validation, always allow private IPs, dev-mode fallback
  • packages/{automl,autorag}/bff/internal/api/app.go -- pass rootCAs (from bundle-paths) to S3 client, removed env var reading
  • packages/{automl,autorag}/bff/internal/integrations/s3/client_test.go -- updated tests: private IPs now accepted, removed permissive mode tests
  • packages/{automl,autorag}/bff/Makefile -- added BUNDLE_PATHS variable for local development

How Has This Been Tested?

  • Verified TLS handshake succeeds against MinIO with cluster CA bundle (downloaded from cluster)
  • Verified dev-mode fallback works when no CA bundles are provided
  • All automl and autorag BFF Go tests pass (5 suites each)
  • All frontend lint and type-check pass
  • All contract tests pass (automl: 81/81, autorag: 105/105)

Test Impact

  • Private IP tests changed from rejection to acceptance (MinIO runs in-cluster)
  • Permissive mode tests removed (no longer configurable -- behavior is hardcoded)
  • SSRF protection tests for dangerous ranges (loopback, link-local, reserved) unchanged

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added
  • The code follows our Best Practices

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

N/A -- backend-only change, no UI impact.

After the PR is posted and before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

Summary by CodeRabbit

  • New Features

    • New --bundle-paths option to supply custom CA bundles used for S3 TLS verification.
    • S3 endpoints on RFC-1918 private IPv4 ranges and IPv6 unique-local addresses are now accepted.
  • Documentation

    • Endpoint validation docs updated to note custom CA bundle support and permitted private/non-public IP ranges.

Add S3 client support for in-cluster MinIO and other S3-compatible
object stores. Three configurable options control security behavior,
all defaulting to permissive for MinIO compatibility:

- S3_ALLOW_HTTP=true: permit plain HTTP S3 endpoints
- S3_INSECURE_SKIP_VERIFY=true: skip TLS cert verification
- S3_ALLOW_INTERNAL_IPS=true: allow RFC-1918 private IPs

Customers can set any of these to "false" via environment variables
to enforce stricter security policies. Loopback, link-local, and
reserved IP ranges remain always blocked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

This PR adds an exported RootCAs *x509.CertPool field to S3ClientOptions, wires a compiled rootCAs into real S3 client factories in both automl and autorag app initializers, and updates the AWS HTTP client creation to use a cloned transport with TLSClientConfig.RootCAs (or, when DevMode and no RootCAs, InsecureSkipVerify=true) and MinVersion: TLS1.2 plus a 30s timeout. Endpoint/SSRF validation was changed to stop blocking RFC-1918 IPv4 and IPv6 ULA ranges (loopback/link-local and other reserved ranges remain blocked). Tests were updated to accept private and ULA endpoints. Makefiles for both packages gained an optional BUNDLE_PATHS variable and pass it to the apps when set.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Security Issues

  • CWE-918 (Server-Side Request Forgery): Removing RFC-1918 and IPv6 ULA from the blocked CIDRs allows HTTPS requests to private/internal IPs. Actionable: require explicit opt-in config to permit private addresses; add audit logging when enabled; add per-principal allowlists and deny-by-default gating.
  • CWE-295 (Improper Certificate Validation): Use of InsecureSkipVerify when DevMode is set weakens TLS guarantees. Actionable: remove or guard InsecureSkipVerify behind a build tag or force explicit, documented dev-only flag; never enable in production builds. Fail fast if DevMode is misconfigured.
  • CWE-311 / CWE-319 (Cleartext Transmission / Sensitive Data Exposure): Accepting private endpoints increases risk of exposing internal metadata or sensitive services even over HTTPS if trust is misconfigured. Actionable: enforce strict endpoint validation and require operator approval for custom CA bundles; log and alert accesses to internal IPs.
  • CWE-20 (Improper Input Validation): Loading CA bundles from arbitrary paths requires robust validation. Actionable: validate file existence, parse errors, and ensure pool is non-empty; fail startup on malformed bundles; surface clear error messages and telemetry.
  • Operational audit: changes alter runtime trust and SSRF policy. Actionable: emit immutable startup audit entries when custom RootCAs or private-IP acceptance is active; document required approvals and operational runbooks.

Architectural Issues

  • Duplicate plumbing: identical S3 CA wiring and endpoint logic appear in both automl and autorag. Actionable: extract shared S3 client construction, CA-bundle loading, and endpoint validation into a single internal package to centralize security defaults and reduce drift.
  • Inconsistent opt-in semantics: behavior depends on presence of RootCAs vs DevMode with differing security profiles. Actionable: centralize defaults and expose explicit, documented flags for (a) allow-private-ips, (b) custom-trust-store, (c) dev-insecure-skip-verify; ensure consistent semantics across services and add targeted tests for each combination.
  • Tests changed semantics without gating: unit tests now accept previously-blocked ranges. Actionable: add tests for both deny-by-default and explicit opt-in paths; add integration tests that validate bundle parsing and failure modes.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(automl,autorag): resolve MinIO storage access issues' accurately summarizes the main change: enabling proper TLS certificate validation for MinIO via operator-mounted CA bundles.
Description check ✅ Passed The description is comprehensive and well-structured, covering the issue, solution rationale, design decisions, testing (verified TLS handshake, dev-mode fallback, all tests passing), and addressing all required checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chrjones-rh chrjones-rh requested review from GAUNSD and nickmazzi and removed request for NickGagan April 13, 2026 20:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
packages/automl/bff/internal/integrations/s3/client_factory.go (1)

24-38: The exported default: true contract is not actually implemented.

S3ClientOptions{} still leaves all three booleans false because withDefaults() only fills transfer knobs. Any caller other than NewApp gets behavior that disagrees with these field docs. Either move the defaulting into construction with a tri-state representation, or remove the default: true wording from the exported type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/automl/bff/internal/integrations/s3/client_factory.go` around lines
24 - 38, The docs claim S3ClientOptions boolean fields default to true but
S3ClientOptions{} leaves them false because withDefaults() only sets transfer
knobs; fix by initializing these defaults in construction: add a constructor
(e.g., NewS3ClientOptions) or modify withDefaults() to set InsecureSkipVerify,
AllowInternalIPs, and AllowHTTP to true when they are unset, and update all call
sites (including callers that instantiate S3ClientOptions directly) to use the
constructor or run withDefaults(); alternatively remove the "default: true" text
from the exported field comments if you prefer not to change behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/automl/bff/internal/api/app.go`:
- Around line 155-176: The current defaults enable insecure S3 behavior; change
the three flags so they default to false and require explicit opt-in via env
vars: update s3InsecureSkipVerify, s3AllowInternalIPs, and s3AllowHTTP to be
enabled only when their respective env var equals "true" (or parse a boolean
from the env) rather than being enabled unless set to "false"; ensure you update
any surrounding comment to reflect the new secure-by-default posture and that
callers (e.g., the S3 client construction code that reads these vars) continue
to use these variables.
- Around line 174-176: The current boolean parsing for s3InsecureSkipVerify,
s3AllowInternalIPs, and s3AllowHTTP uses != "false" which fails open; change
each to parse with strconv.ParseBool(strings.TrimSpace(os.Getenv(...))) and
treat parse errors as invalid (log and return an error or set the value to false
explicitly). Specifically, replace assignments to s3InsecureSkipVerify,
s3AllowInternalIPs, and s3AllowHTTP with code that calls strings.TrimSpace on
the env var, calls strconv.ParseBool, if err != nil then reject/handle the
invalid value (e.g., return startup error or log and set false), otherwise set
the boolean to the parsed value so the default is secure (false).

In `@packages/automl/bff/internal/integrations/s3/client.go`:
- Around line 102-110: When opts.InsecureSkipVerify is true the code assigns
cfg.HTTPClient a bare http.Client with a minimal http.Transport which lacks the
standard timeouts; update the branch that sets cfg.HTTPClient so it clones
http.DefaultTransport (preserving its Dial/TLS/Idle timeout settings) and
assigns that transport to the http.Client, and set http.Client.Timeout to 30
seconds to enforce an explicit timeout consistent with other BFF HTTP
integrations (adjusting TLSClientConfig.InsecureSkipVerify on the cloned
transport as currently done).

In `@packages/autorag/bff/internal/api/app.go`:
- Around line 179-186: The three S3 flags (s3InsecureSkipVerify,
s3AllowInternalIPs, s3AllowHTTP) are currently parsed as fail-open; replace the
current os.Getenv(...) != "false" logic with strconv.ParseBool for each
variable, default to false (opt-in) and abort startup on parse error so invalid
values fail fast; wire the parsed booleans into s3int.S3ClientOptions. Also
update the HTTP client created earlier (the external-service http.Client) to set
a sensible Timeout (e.g., 30s) and ensure TLS verification is not disabled by
default. Ensure error handling surfaces parse errors (return or log.Fatal)
rather than silently continuing with insecure defaults.

In `@packages/autorag/bff/internal/integrations/s3/client_factory.go`:
- Around line 21-35: The comment points out that S3ClientOptions boolean fields
(InsecureSkipVerify, AllowInternalIPs, AllowHTTP) default to false because
withDefaults() only sets numeric fields; update withDefaults() (or
NewRealClientFactory()) to explicitly set these three booleans to true when the
struct fields are zero-valued so the documented defaults are actually applied,
and ensure any callers that rely on zero-values (e.g., tests using
S3ClientOptions{}) will now receive the intended true defaults; reference the
S3ClientOptions type and the withDefaults()/NewRealClientFactory() functions
when making the change.

In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Around line 82-90: The custom S3 HTTP client currently builds a zero-value
http.Transport when opts.InsecureSkipVerify is true, losing
http.DefaultTransport features and leaving no http.Client.Timeout; fix by
cloning http.DefaultTransport (type-assert to *http.Transport and create a
shallow copy) then modify its TLSClientConfig to set InsecureSkipVerify and
MinVersion as needed, assign that transport to cfg.HTTPClient, and set a
sensible cfg.HTTPClient.Timeout (non-zero) to avoid hanging requests; update
references in the block that constructs cfg.HTTPClient and where
opts.InsecureSkipVerify is checked.

---

Nitpick comments:
In `@packages/automl/bff/internal/integrations/s3/client_factory.go`:
- Around line 24-38: The docs claim S3ClientOptions boolean fields default to
true but S3ClientOptions{} leaves them false because withDefaults() only sets
transfer knobs; fix by initializing these defaults in construction: add a
constructor (e.g., NewS3ClientOptions) or modify withDefaults() to set
InsecureSkipVerify, AllowInternalIPs, and AllowHTTP to true when they are unset,
and update all call sites (including callers that instantiate S3ClientOptions
directly) to use the constructor or run withDefaults(); alternatively remove the
"default: true" text from the exported field comments if you prefer not to
change behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: d1c0fd4b-1604-48c9-b79d-89152dd56ff5

📥 Commits

Reviewing files that changed from the base of the PR and between 38e3af3 and c294b19.

📒 Files selected for processing (6)
  • packages/automl/bff/internal/api/app.go
  • packages/automl/bff/internal/integrations/s3/client.go
  • packages/automl/bff/internal/integrations/s3/client_factory.go
  • packages/autorag/bff/internal/api/app.go
  • packages/autorag/bff/internal/integrations/s3/client.go
  • packages/autorag/bff/internal/integrations/s3/client_factory.go

Comment thread packages/automl/bff/internal/api/app.go Outdated
Comment thread packages/automl/bff/internal/api/app.go Outdated
Comment thread packages/autorag/bff/internal/api/app.go Outdated
Comment thread packages/autorag/bff/internal/integrations/s3/client_factory.go Outdated
Comment thread packages/autorag/bff/internal/integrations/s3/client.go Outdated
Comment thread packages/autorag/bff/internal/integrations/s3/client.go Outdated
@chrjones-rh
Copy link
Copy Markdown
Contributor Author

Previously blocked interactions with MinIO now work:
disconnected-ml-run

Note that the run itself is not successful because the automl image referenced in my local pipeline yaml does not match the image available on the disconnected cluster. This is a temporary issue due to the updated Pipeline yaml files earlier today. Once the latest build is available with both the UI and back-end changes, things will be in sync.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.81%. Comparing base (38e3af3) to head (6b6e0ed).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7221      +/-   ##
==========================================
+ Coverage   64.77%   64.81%   +0.04%     
==========================================
  Files        2447     2441       -6     
  Lines       76059    76004      -55     
  Branches    19172    19159      -13     
==========================================
- Hits        49265    49264       -1     
+ Misses      26794    26740      -54     

see 19 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38e3af3...6b6e0ed. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@GAUNSD
Copy link
Copy Markdown
Contributor

GAUNSD commented Apr 13, 2026

AI code review 🤖 🔍

🤖: Claude Opus 4.6

Backend-only Go changes across 6 files in automl/autorag BFF packages, adding three env-var-configurable S3 client options (AllowHTTP, InsecureSkipVerify,
AllowInternalIPs) that all default to permissive (true) for in-cluster MinIO compatibility. The intent is sound and the code is well-commented, but there are
several issues ranging from a latent timeout bug to test coverage gaps. All CI checks pass, but the tests only exercise the restrictive path (S3ClientOptions{}
zero-values = all false) — the production-default permissive path (AllowHTTP=true, etc.) has zero test coverage. CodeRabbit already flagged the same core
issues I found independently, which increases my confidence in the findings.

Summary

# Severity Title Risk Files
1 🔴 High Bare http.Transport{} drops all default timeouts Production: goroutine leak on unresponsive S3 automl/.../client.go, autorag/.../client.go
2 🟡 Medium Env var parsing fails open on typos/case mismatch Operator thinks security is enabled but it isn't automl/.../app.go, autorag/.../app.go
3 🟡 Medium Zero test coverage for the permissive (production-default) code paths Can't verify the code that actually runs in prod automl/.../client_test.go, autorag/.../client_test.go
4 🔵 Low S3ClientOptions struct docs claim "default: true" but zero-value is false Misleading for future callers creating the struct directly automl/.../client_factory.go, autorag/.../client_factory.go
5 🔵 Low automl missing CGNAT range (100.64.0.0/10) in conditional block Pre-existing inconsistency with autorag; not a regression automl/.../client.go

Details

Item 1: 🔴 Bare `http.Transport{}` drops all default timeouts

🔴 Bare http.Transport{} drops all default timeouts

When InsecureSkipVerify=true (the production default), both packages create a custom http.Client with a bare &http.Transport{}:

cfg.HTTPClient = &http.Client{
    Transport: &http.Transport{
        TLSClientConfig: &tls.Config{
            InsecureSkipVerify: true,
            MinVersion:         tls.VersionTLS12,
        },
    },
}

A zero-value http.Transport loses everything from http.DefaultTransport:

  • DialContext timeout (30s dial timeout)
  • TLSHandshakeTimeout (10s)
  • IdleConnTimeout (90s)
  • MaxIdleConns / MaxIdleConnsPerHost
  • Proxy support
  • HTTP/2 support

And http.Client.Timeout is also unset (zero = no timeout).

Impact: If an S3 endpoint is unresponsive or blackholed, the TCP connect will block for the OS-level timeout (typically 2+ minutes) and the request will have no
overall deadline. Under load this causes goroutine leaks and potential resource exhaustion.

Fix: Clone http.DefaultTransport and override only the TLS config:

transport := http.DefaultTransport.(*http.Transport).Clone()
transport.TLSClientConfig = &tls.Config{
    InsecureSkipVerify: true,
    MinVersion:         tls.VersionTLS12,
}
cfg.HTTPClient = &http.Client{
    Transport: transport,
    Timeout:   30 * time.Second,
}

Affected files:

  • packages/automl/bff/internal/integrations/s3/client.go:102-110
  • packages/autorag/bff/internal/integrations/s3/client.go:82-90
Item 2: 🟡 Env var parsing fails open on typos and case mismatches

🟡 Env var parsing fails open on typos and case mismatches

All three security flags use os.Getenv("...") != "false":

s3InsecureSkipVerify := os.Getenv("S3_INSECURE_SKIP_VERIFY") != "false"
s3AllowInternalIPs := os.Getenv("S3_ALLOW_INTERNAL_IPS") != "false"
s3AllowHTTP := os.Getenv("S3_ALLOW_HTTP") != "false"

This means any value other than the exact lowercase string "false" enables the insecure behavior:

  • "False", "FALSE"permissive (silent misconfiguration)
  • "no", "0", "off"permissive (common boolean representations ignored)
  • "fals" (typo) → permissive (no error, no log)

For security-sensitive flags, the failure mode should be fail-closed (reject invalid values, or at minimum use strconv.ParseBool which handles case-insensitive
true/false/0/1).

Risk: An operator who sets S3_INSECURE_SKIP_VERIFY=FALSE believes they've hardened the deployment, but TLS verification remains disabled. This is a silent
security misconfiguration.

Fix: Use strconv.ParseBool with proper error handling, or at minimum strings.EqualFold(val, "false") for case-insensitive comparison.

Affected files:

  • packages/automl/bff/internal/api/app.go:174-176
  • packages/autorag/bff/internal/api/app.go:179-181
Item 3: 🟡 Zero test coverage for the permissive (production-default) code paths

🟡 Zero test coverage for the permissive (production-default) code paths

newTestClient() in both packages creates &RealS3Client{options: S3ClientOptions{DevMode: false}} — all three new boolean fields default to false. Every test
exercises the restrictive path.

No tests exist for:

  • HTTP endpoints being accepted when AllowHTTP=true
  • Private IPs (10.x, 172.16.x, 192.168.x) being accepted when AllowInternalIPs=true
  • The interaction between InsecureSkipVerify and the custom HTTP transport

The PR description states "No new tests required — existing validation tests cover the restrictive path." This is technically accurate but means the code path that
runs in production (all three flags = true) has zero verification.

Why this matters for code freeze: If there's a logic error in the conditional blocks (e.g., the if !c.options.AllowInternalIPs check is inverted), we won't catch
it until production.

Minimum tests needed:

func TestValidateAndNormalizeEndpoint_AcceptsHTTPWhenAllowed(t *testing.T) {
    c := &RealS3Client{options: S3ClientOptions{AllowHTTP: true}}
    result, err := c.validateAndNormalizeEndpoint("http://s3.amazonaws.com")
    assert.NoError(t, err)
    assert.Equal(t, "http://s3.amazonaws.com", result)
}

func TestValidateAndNormalizeEndpoint_AcceptsPrivateIPWhenAllowed(t *testing.T) {
    c := &RealS3Client{options: S3ClientOptions{AllowInternalIPs: true}}
    result, err := c.validateAndNormalizeEndpoint("https://10.0.0.1:9000")
    assert.NoError(t, err)
    assert.Equal(t, "https://10.0.0.1:9000", result)
}

func TestValidateAndNormalizeEndpoint_StillBlocksLoopbackWhenPermissive(t *testing.T) {
    c := &RealS3Client{options: S3ClientOptions{AllowInternalIPs: true, AllowHTTP: true}}
    _, err := c.validateAndNormalizeEndpoint("http://127.0.0.1:9000")
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "loopback")
}
Item 4: 🔵 `S3ClientOptions` struct docs claim "default: true" but zero-value is `false`

🔵 S3ClientOptions struct docs claim "default: true" but zero-value is false

The struct field comments say:

// Controlled via S3_INSECURE_SKIP_VERIFY env var (default: true).
InsecureSkipVerify bool

But S3ClientOptions{} yields InsecureSkipVerify: false. The "default: true" behavior only exists because app.go explicitly sets them. withDefaults() only
handles numeric transfer tuning fields.

This isn't a bug today since app.go is the only production caller, but it creates a documentation contract mismatch that could mislead future callers.

Fix: Either update the comments to say "set to true by NewApp; struct zero-value is false" or add the boolean defaults to withDefaults().

Affected files:

  • packages/automl/bff/internal/integrations/s3/client_factory.go:24-38
  • packages/autorag/bff/internal/integrations/s3/client_factory.go:21-35
Item 5: 🔵 automl missing CGNAT range (`100.64.0.0/10`) in conditional block

🔵 automl missing CGNAT range (100.64.0.0/10) in conditional block

autorag's validateIPAddress conditionally blocks the Carrier-Grade NAT range (100.64.0.0/10, RFC 6598) when AllowInternalIPs=false. automl does not have this
range at all — not in the always-blocked list and not in the conditional list.

This is a pre-existing inconsistency (automl never had this range, even before this PR). The PR moved ranges from always-blocked to conditionally-blocked but
didn't harmonize the two implementations.

Impact: When AllowInternalIPs=false, automl allows connections to CGNAT addresses (100.64.x.x – 100.127.x.x) while autorag correctly blocks them. In practice
this is low-risk because CGNAT addresses are uncommon in Kubernetes clusters, but it's an SSRF surface difference.

Fix: Add {"100.64.0.0/10", "Carrier-Grade NAT range (RFC 6598)"} to automl's conditional block.

Affected file: packages/automl/bff/internal/integrations/s3/client.go (conditional blockedRanges append block)

Human code review 👤 🔍

I think items 1, 2, 3 should be covered before merging

…sive mode tests

- Clone http.DefaultTransport instead of bare http.Transport to preserve
  default timeouts and connection pooling. Add 30s client timeout.
- Add tests for the permissive (production-default) code paths:
  HTTP accepted when AllowHTTP=true, private IPs accepted when
  AllowInternalIPs=true, loopback/link-local still blocked when
  permissive, HTTP+private IP combination works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chrjones-rh
Copy link
Copy Markdown
Contributor Author

chrjones-rh commented Apr 13, 2026

I think items 1, 2, 3 should be covered before merging

One and three have been addressed, the second item is intentional to match the Pipelines UI behaviour:

The PR was fully refactored to ensure that TLS enforcement remains in place in production. Introduced validation for the self signed certs used with MinIO.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
packages/autorag/bff/internal/integrations/s3/client_test.go (1)

181-192: Missing test coverage for CGNAT and IPv6 ULA in permissive mode.

The test covers RFC-1918 ranges but omits CGNAT (100.64.0.0/10) and IPv6 ULA (fc00::/7), both of which are conditionally allowed when AllowInternalIPs is true. Without coverage, a regression in those ranges would go undetected.

Add missing ranges
 func TestValidateAndNormalizeEndpoint_AcceptsPrivateIPWhenAllowed(t *testing.T) {
 	c := newPermissiveTestClient()
 	for _, endpoint := range []string{
 		"https://10.0.0.1:9000",
 		"https://172.16.0.1:9000",
 		"https://192.168.1.1:9000",
+		"https://100.64.0.1:9000",     // CGNAT (RFC 6598)
+		"https://[fc00::1]:9000",       // IPv6 ULA
 	} {
 		result, err := c.validateAndNormalizeEndpoint(endpoint)
 		assert.NoError(t, err, "should accept %s", endpoint)
 		assert.Equal(t, endpoint, result)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autorag/bff/internal/integrations/s3/client_test.go` around lines
181 - 192, The test TestValidateAndNormalizeEndpoint_AcceptsPrivateIPWhenAllowed
currently checks only RFC-1918 ranges; update it to also include a CGNAT IPv4
address (e.g. "https://100.64.0.1:9000") and an IPv6 ULA address (e.g.
"https://[fd00::1]:9000") in the endpoint list, using the same
newPermissiveTestClient() and calling c.validateAndNormalizeEndpoint(endpoint)
and asserting NoError and equality for each; ensure the IPv6 address is
bracketed and treated the same as the other endpoints so CGNAT and ULA
acceptance is covered.
packages/autorag/bff/internal/integrations/s3/client.go (1)

351-372: Verbose struct literal repetition.

The repeated anonymous struct literals are functional but noisy. A minor cleanup using a named type or slice literal would improve readability.

Optional cleanup
+	type blockedRange struct {
+		cidr        string
+		description string
+	}
+
 	// Private/internal ranges — blocked unless AllowInternalIPs is set
 	if !c.options.AllowInternalIPs {
-		blockedRanges = append(blockedRanges,
-			struct {
-				cidr        string
-				description string
-			}{"10.0.0.0/8", "RFC-1918 private range (10.0.0.0/8)"},
-			struct {
-				cidr        string
-				description string
-			}{"100.64.0.0/10", "Carrier-Grade NAT range (RFC 6598)"},
-			struct {
-				cidr        string
-				description string
-			}{"172.16.0.0/12", "RFC-1918 private range (172.16.0.0/12)"},
-			struct {
-				cidr        string
-				description string
-			}{"192.168.0.0/16", "RFC-1918 private range (192.168.0.0/16)"},
-			struct {
-				cidr        string
-				description string
-			}{"fc00::/7", "IPv6 unique local addresses"},
-		)
+		blockedRanges = append(blockedRanges, []blockedRange{
+			{"10.0.0.0/8", "RFC-1918 private range (10.0.0.0/8)"},
+			{"100.64.0.0/10", "Carrier-Grade NAT range (RFC 6598)"},
+			{"172.16.0.0/12", "RFC-1918 private range (172.16.0.0/12)"},
+			{"192.168.0.0/16", "RFC-1918 private range (192.168.0.0/16)"},
+			{"fc00::/7", "IPv6 unique local addresses"},
+		}...)
 	}

Note: This requires changing the blockedRanges declaration at line 337 to use the named type as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autorag/bff/internal/integrations/s3/client.go` around lines 351 -
372, Replace the repeated anonymous struct literals by defining a named type
(e.g., type blockedRange struct { cidr, description string }) and change
blockedRanges to be a slice of that type (blockedRanges []blockedRange). Then
populate blockedRanges with a concise composite literal like []blockedRange{ {
"10.0.0.0/8", "RFC-1918 private range (10.0.0.0/8)" }, ... } instead of repeated
struct { ... }{...} entries; update any references to blockedRanges accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/automl/bff/internal/integrations/s3/client.go`:
- Around line 517-535: The strict SSRF blocking in client.go omits the shared
address space 100.64.0.0/10 when c.options.AllowInternalIPs is false, leaving
CGNAT reachable; update the blockedRanges append (the same block that adds
"10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16", "fc00::/7") to also include the
CIDR "100.64.0.0/10" with a descriptive label (e.g., "Carrier-grade NAT
(100.64.0.0/10)") so AllowInternalIPs enforcement in the code that builds/uses
blockedRanges covers shared address space as well.
- Around line 459-464: The current check allowing parsedURL.Scheme == "http"
when c.options.AllowHTTP is true is unsafe; change it to fail closed by
rejecting any non-"https" scheme by default and only permit "http" when an
explicit, clearly named dev-only flag is set (e.g. c.options.AllowHTTPDev) in
addition to c.options.AllowHTTP, and document that this is for local/dev use
only; also ensure the S3 HTTP client you build enforces TLS verification (do not
set InsecureSkipVerify) and always configures sensible timeouts on the HTTP
client/transport used to create the S3 client (set
Transport.TLSClientConfig.Verify and http.Client.Timeout) so that when you do
permit non-HTTPS for dev the code path is explicit and production paths remain
HTTPS-only (update checks around parsedURL.Scheme and usages of
c.options.AllowHTTP accordingly).
- Around line 102-113: The code currently sets opts.InsecureSkipVerify to true
and assigns transport.TLSClientConfig with InsecureSkipVerify in the S3 client
creation (see opts.InsecureSkipVerify, transport.TLSClientConfig,
cfg.HTTPClient), which disables TLS verification; change this to require valid
certificate verification by default: remove or disallow setting
InsecureSkipVerify=true, instead load a CA bundle when provided (e.g., from a
mounted ConfigMap/ENV like S3_ROOT_CA) and set TLSClientConfig.RootCAs
accordingly, and make any insecure-skip path explicit and rejected at
initialization if not intentionally enabled (validate S3_INSECURE_SKIP_VERIFY so
production default is secure); ensure the code that reads
S3_INSECURE_SKIP_VERIFY enforces false-by-default and returns an error if an
insecure mode is requested without explicit approval.

In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Line 82: The conditional is checking the raw opts instead of the
defaults-applied config; change the check to use c.options.InsecureSkipVerify
(after opts.withDefaults() which was stored on c.options) so the code reads the
resolved setting, i.e., replace uses of opts.InsecureSkipVerify with
c.options.InsecureSkipVerify in the relevant conditional around the
TLS/insecure-skip-verify logic in the client initialization (look for
opts.withDefaults(), c.options, and the InsecureSkipVerify check).

---

Nitpick comments:
In `@packages/autorag/bff/internal/integrations/s3/client_test.go`:
- Around line 181-192: The test
TestValidateAndNormalizeEndpoint_AcceptsPrivateIPWhenAllowed currently checks
only RFC-1918 ranges; update it to also include a CGNAT IPv4 address (e.g.
"https://100.64.0.1:9000") and an IPv6 ULA address (e.g.
"https://[fd00::1]:9000") in the endpoint list, using the same
newPermissiveTestClient() and calling c.validateAndNormalizeEndpoint(endpoint)
and asserting NoError and equality for each; ensure the IPv6 address is
bracketed and treated the same as the other endpoints so CGNAT and ULA
acceptance is covered.

In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Around line 351-372: Replace the repeated anonymous struct literals by
defining a named type (e.g., type blockedRange struct { cidr, description string
}) and change blockedRanges to be a slice of that type (blockedRanges
[]blockedRange). Then populate blockedRanges with a concise composite literal
like []blockedRange{ { "10.0.0.0/8", "RFC-1918 private range (10.0.0.0/8)" },
... } instead of repeated struct { ... }{...} entries; update any references to
blockedRanges accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 6b7b72e2-f9ac-4b5f-9459-88eece33d8a2

📥 Commits

Reviewing files that changed from the base of the PR and between c294b19 and bc6713d.

📒 Files selected for processing (4)
  • packages/automl/bff/internal/integrations/s3/client.go
  • packages/automl/bff/internal/integrations/s3/client_test.go
  • packages/autorag/bff/internal/integrations/s3/client.go
  • packages/autorag/bff/internal/integrations/s3/client_test.go

Comment thread packages/automl/bff/internal/integrations/s3/client.go Outdated
Comment thread packages/automl/bff/internal/integrations/s3/client.go Outdated
Comment thread packages/automl/bff/internal/integrations/s3/client.go Outdated
Comment thread packages/autorag/bff/internal/integrations/s3/client.go Outdated
@chrjones-rh chrjones-rh changed the title fix(automl,autorag): add support for MinIO storage fix(automl,autorag): resolve MinIO storage access issues Apr 13, 2026
- Add CGNAT 100.64.0.0/10 to automl blocked ranges (was already in autorag)
- Use named blockedRange type instead of repeated anonymous struct literals
- Fix autorag to use c.options.InsecureSkipVerify (post-defaults) instead
  of raw opts
- Add CGNAT and IPv6 ULA test cases to permissive mode tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chrjones-rh chrjones-rh marked this pull request as draft April 13, 2026 22:39
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress This PR is in WIP state label Apr 13, 2026
…rification

Replace InsecureSkipVerify, AllowHTTP, and AllowInternalIPs env vars
(which were not settable in production) with proper CA bundle-based TLS
verification. The RHOAI operator already mounts cluster and custom CA
bundles into the BFF pod via --bundle-paths, so self-signed MinIO
certificates are validated against these bundles rather than skipped.

- Remove S3_INSECURE_SKIP_VERIFY, S3_ALLOW_HTTP, S3_ALLOW_INTERNAL_IPS
- Add RootCAs to S3ClientOptions, populated from operator-mounted bundles
- HTTPS always required (no plain HTTP)
- Private IPs always allowed (MinIO runs in-cluster)
- Dev-mode fallback: skip TLS verification when no CA bundles provided
- Add BUNDLE_PATHS support to Makefiles for local development

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chrjones-rh chrjones-rh marked this pull request as ready for review April 13, 2026 23:58
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress This PR is in WIP state label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
packages/autorag/bff/internal/integrations/s3/client.go (1)

347-366: ⚠️ Potential issue | 🔴 Critical

Re-block private/CGNAT/ULA targets or replace this with an explicit allowlist (CWE-918).

This change turns any user-controlled S3 endpoint into a cluster-internal SSRF primitive: 10/8, 172.16/12, 192.168/16, 100.64/10, and fc00::/7 are all reachable as long as they speak HTTPS. With the operator CA bundle mounted, many internal services will also present certs this client trusts.

Suggested fix
 	blockedRanges := []blockedRange{
+		{"10.0.0.0/8", "RFC-1918 private range (10.0.0.0/8)"},
+		{"172.16.0.0/12", "RFC-1918 private range (172.16.0.0/12)"},
+		{"192.168.0.0/16", "RFC-1918 private range (192.168.0.0/16)"},
+		{"100.64.0.0/10", "CGNAT range (100.64.0.0/10)"},
+		{"fc00::/7", "IPv6 unique local range (fc00::/7)"},
 		{"0.0.0.0/8", "reserved 'this network' range (RFC 1122)"},
 		{"169.254.0.0/16", "link-local range (169.254.0.0/16)"},
 		{"127.0.0.0/8", "loopback range (127.0.0.0/8)"},
 		{"240.0.0.0/4", "reserved for future use (RFC 1112)"},
 		{"::1/128", "IPv6 loopback"},
 		{"fe80::/10", "IPv6 link-local"},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autorag/bff/internal/integrations/s3/client.go` around lines 347 -
366, The validateIPAddress function on RealS3Client currently permits
RFC1918/CGNAT/ULA ranges, enabling SSRF; update the blockedRanges slice in
validateIPAddress to include 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16,
100.64.0.0/10 and the IPv6 unique local block fc00::/7 (in addition to the
existing loopback/link-local/reserved entries), or replace the permissive check
by implementing an explicit allowlist approach used by the S3 client connection
path (i.e., only permit known-good public CIDRs/hostnames before making
requests); ensure you modify validateIPAddress (RealS3Client) and any callers to
use the new blocked/allowlist logic consistently.
packages/automl/bff/internal/integrations/s3/client.go (1)

116-125: ⚠️ Potential issue | 🔴 Critical

Remove dev fallback that sets InsecureSkipVerify (CWE-295).

Line 124 disables certificate verification. If a developer runs this path on an untrusted network, S3 credentials/data are MITM-exposable.

Proposed fix
-	} else if c.options.DevMode {
-		// In dev mode without CA bundles, skip TLS verification so developers
-		// can test against clusters with self-signed certificates from their
-		// local machine. In production the operator always provides CA bundles
-		// via --bundle-paths, so this path is never reached.
-		slog.Warn("S3 TLS certificate verification disabled (dev mode, no CA bundles provided)")
-		transport := http.DefaultTransport.(*http.Transport).Clone()
-		transport.TLSClientConfig = &tls.Config{
-			InsecureSkipVerify: true, //nolint:gosec // dev-mode only fallback
-			MinVersion:         tls.VersionTLS12,
-		}
-		cfg.HTTPClient = &http.Client{
-			Transport: transport,
-			Timeout:   30 * time.Second,
-		}
+	} else if c.options.DevMode {
+		return nil, fmt.Errorf("dev mode requires trusted CA bundles via --bundle-paths; refusing to disable TLS verification")
 	}
#!/bin/bash
# Verify no insecure TLS bypass remains in this S3 client.
rg -n --type go -C2 'InsecureSkipVerify\s*:\s*true' packages/automl/bff/internal/integrations/s3/client.go

As per coding guidelines, **/*.go: "No InsecureSkipVerify in TLS configs (enables MITM attacks)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/automl/bff/internal/integrations/s3/client.go` around lines 116 -
125, The dev-mode TLS bypass in the S3 client (the branch checking
c.options.DevMode that sets transport.TLSClientConfig.InsecureSkipVerify = true)
must be removed; instead, stop creating a transport with InsecureSkipVerify and
return an explicit error or require valid CA bundles when TLS verification
cannot be performed. Locate the branch that logs "S3 TLS certificate
verification disabled (dev mode, no CA bundles provided)" and replace it with
logic that fails fast (e.g., return an error from the S3 client constructor or
require c.options.BundlePaths) and ensure transport.TLSClientConfig is never set
with InsecureSkipVerify, keeping MinVersion TLS1.2 handling but relying on
proper CA configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/automl/bff/Makefile`:
- Line 67: The Makefile run command leaves $(BUNDLE_PATHS) unquoted which can
allow word-splitting or shell injection; update the conditional in the go run
invocation so the flag emits --bundle-paths="$(BUNDLE_PATHS)" (preserving the
existing conditional that only adds the flag when BUNDLE_PATHS is set) to ensure
the variable is passed as a single quoted argument and prevent splitting or
metacharacter interpretation.

In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Around line 96-110: The dev-mode branch that sets
tls.Config{InsecureSkipVerify: true} must be removed and replaced with a
fail-closed path: when c.options.DevMode is true but no CA bundles are provided,
do not set InsecureSkipVerify or construct a permissive transport; instead
return an error (or log and exit) from the S3 client initialization so the
client creation fails and requires a valid CA bundle. Locate the
c.options.DevMode check and the code that sets transport.TLSClientConfig /
cfg.HTTPClient and change it to reject initialization (propagate an error from
the enclosing constructor/function) rather than disabling certificate
verification.

In `@packages/autorag/bff/Makefile`:
- Line 94: In the Makefile target where the go run invocation builds the flag
--bundle-paths=$(BUNDLE_PATHS), quote the shell variable so it isn't subject to
word-splitting or globbing (change the expansion to use "$(BUNDLE_PATHS)" in the
go run command); locate the line containing --bundle-paths=$(BUNDLE_PATHS) in
the Makefile and update it to pass the quoted BUNDLE_PATHS variable, preserving
the rest of the command and escaping as needed for the shell context.

---

Duplicate comments:
In `@packages/automl/bff/internal/integrations/s3/client.go`:
- Around line 116-125: The dev-mode TLS bypass in the S3 client (the branch
checking c.options.DevMode that sets
transport.TLSClientConfig.InsecureSkipVerify = true) must be removed; instead,
stop creating a transport with InsecureSkipVerify and return an explicit error
or require valid CA bundles when TLS verification cannot be performed. Locate
the branch that logs "S3 TLS certificate verification disabled (dev mode, no CA
bundles provided)" and replace it with logic that fails fast (e.g., return an
error from the S3 client constructor or require c.options.BundlePaths) and
ensure transport.TLSClientConfig is never set with InsecureSkipVerify, keeping
MinVersion TLS1.2 handling but relying on proper CA configuration.

In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Around line 347-366: The validateIPAddress function on RealS3Client currently
permits RFC1918/CGNAT/ULA ranges, enabling SSRF; update the blockedRanges slice
in validateIPAddress to include 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16,
100.64.0.0/10 and the IPv6 unique local block fc00::/7 (in addition to the
existing loopback/link-local/reserved entries), or replace the permissive check
by implementing an explicit allowlist approach used by the S3 client connection
path (i.e., only permit known-good public CIDRs/hostnames before making
requests); ensure you modify validateIPAddress (RealS3Client) and any callers to
use the new blocked/allowlist logic consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: fba82c4c-b40f-4362-a494-6ac5cfe26309

📥 Commits

Reviewing files that changed from the base of the PR and between bc6713d and 6afcfb3.

📒 Files selected for processing (10)
  • packages/automl/bff/Makefile
  • packages/automl/bff/internal/api/app.go
  • packages/automl/bff/internal/integrations/s3/client.go
  • packages/automl/bff/internal/integrations/s3/client_factory.go
  • packages/automl/bff/internal/integrations/s3/client_test.go
  • packages/autorag/bff/Makefile
  • packages/autorag/bff/internal/api/app.go
  • packages/autorag/bff/internal/integrations/s3/client.go
  • packages/autorag/bff/internal/integrations/s3/client_factory.go
  • packages/autorag/bff/internal/integrations/s3/client_test.go
✅ Files skipped from review due to trivial changes (1)
  • packages/automl/bff/internal/api/app.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/autorag/bff/internal/integrations/s3/client_factory.go
  • packages/automl/bff/internal/integrations/s3/client_test.go

Comment thread packages/automl/bff/Makefile Outdated
Comment thread packages/autorag/bff/internal/integrations/s3/client.go
Comment thread packages/autorag/bff/Makefile Outdated
chrjones-rh and others added 2 commits April 13, 2026 20:11
…plitting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nickmazzi
Copy link
Copy Markdown
Contributor

AI assisted review

pr-review-7221.md

Overall Assessment

Approve with minor comments. This is a well-reasoned security improvement — moving from InsecureSkipVerify=true to proper CA-based verification is the right call. The code is well-documented, the test changes are consistent, and the design decisions (HTTPS-only, private IPs allowed, dev-mode fallback) are sound. A few items worth addressing below.

Suggestions

  1. Test the TLS transport path: Even a simple test that constructs a RealS3Client with a non-nil RootCAs and verifies the client was created successfully (without actually connecting) would add confidence:

    func TestNewRealS3Client_WithRootCAs(t *testing.T) {
        pool := x509.NewCertPool()
        _, err := NewRealS3Client(&S3Credentials{
            AccessKeyID: "a", SecretAccessKey: "b",
            Region: "us-east-1",
            EndpointURL: "https://s3.amazonaws.com",
        }, S3ClientOptions{RootCAs: pool})
        assert.NoError(t, err)
    }
  2. Document the SSRF allowlist change: The removal of RFC-1918 and Carrier-Grade NAT from the blocklist is a deliberate security tradeoff. A one-line comment in validateIPAddress noting why CGN (100.64/10) is also allowed (same rationale as RFC-1918: pod CIDRs) would help future reviewers.

  3. Guard the type assertion on http.DefaultTransport.(*http.Transport) as described above — low risk but defensive.

Test Validation

autorag

Frontend ✅

Test Suites: 38 passed, 38 total
Tests:       646 passed, 646 total
Snapshots:   0 total
Time:        14.561 s, estimated 30 s

BFF ✅

go fmt ./...
go vet ./...
ok      github.com/opendatahub-io/autorag-library/bff/cmd       (cached)
ok      github.com/opendatahub-io/autorag-library/bff/internal/api      (cached)
?       github.com/opendatahub-io/autorag-library/bff/internal/config   [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/constants     [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/helpers  [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations  [no test files]
ok      github.com/opendatahub-io/autorag-library/bff/internal/integrations/kubernetes        (cached)
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/kubernetes/k8mocks        [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/kubernetes/k8smocks       [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/llamastack        [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/llamastack/lsmocks        [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/pipelineserver    [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/pipelineserver/psmocks    [no test files]
ok      github.com/opendatahub-io/autorag-library/bff/internal/integrations/s3(cached)
?       github.com/opendatahub-io/autorag-library/bff/internal/integrations/s3/s3mocks        [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/mocks    [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/models   [no test files]
?       github.com/opendatahub-io/autorag-library/bff/internal/pipelines     [no test files]
ok      github.com/opendatahub-io/autorag-library/bff/internal/repositories  (cached)

Contract ✅

Test Suites: 1 passed, 1 total
Tests:       105 passed, 105 total
Snapshots:   0 total
Time:        1.248 s, estimated 3 s
Test Suites: 1 passed, 1 total
Tests:       105 passed, 105 total

automl

Frontend ✅

Test Suites: 47 passed, 47 total
Tests:       760 passed, 760 total
Snapshots:   0 total
Time:        13.092 s, estimated 18 s

BFF ✅

go fmt ./...
go vet ./...
ok      github.com/opendatahub-io/automl-library/bff/cmd        (cached)
ok      github.com/opendatahub-io/automl-library/bff/internal/api       (cached)
?       github.com/opendatahub-io/automl-library/bff/internal/config    [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/constants [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/helpers   [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/integrations   [no test files]
ok      github.com/opendatahub-io/automl-library/bff/internal/integrations/kubernetes (cached)
?       github.com/opendatahub-io/automl-library/bff/internal/integrations/kubernetes/k8mocks [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/integrations/modelregistry      [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/integrations/pipelineserver     [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/integrations/pipelineserver/psmocks     [no test files]
ok      github.com/opendatahub-io/automl-library/bff/internal/integrations/s3(cached)
?       github.com/opendatahub-io/automl-library/bff/internal/integrations/s3/s3mocks [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/mocks     [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/models    [no test files]
?       github.com/opendatahub-io/automl-library/bff/internal/pipelines [no test files]
ok      github.com/opendatahub-io/automl-library/bff/internal/repositories   (cached)

Contract ✅

Test Suites: 1 passed, 1 total
Tests:       81 passed, 81 total
Snapshots:   0 total
Time:        1.114 s, estimated 2 s
Test Suites: 1 passed, 1 total
Tests:       81 passed, 81 total

Manual validation

worked in both autorag and ml

image image

… document SSRF allowlist

- Extract cloneDefaultTransport() to safely handle non-standard
  http.DefaultTransport replacements (e.g. in test environments)
- Add TestNewRealS3Client_WithRootCAs and TestNewRealS3Client_DevModeFallback
  to verify both TLS transport configuration paths
- Document CGN (100.64/10, RFC 6598) alongside RFC-1918 and IPv6 ULA
  in validateIPAddress doc comments as permitted ranges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nickmazzi
Copy link
Copy Markdown
Contributor

Addressed the 3 comments. Thanks @chrjones-rh

Test the TLS transport path — Added TestNewRealS3Client_WithRootCAs (verifies client creation with a custom CA pool) and TestNewRealS3Client_DevModeFallback (verifies client creation in dev mode without CA bundles) to both automl/client_test.go and autorag/client_test.go.

Document the SSRF allowlist change — Updated the validateIPAddress doc comment in both automl/client.go and autorag/client.go to explicitly list CGN (100.64/10, RFC 6598) alongside RFC-1918 and IPv6 ULA as permitted ranges.

Guard the type assertion — Extracted cloneDefaultTransport() in both packages. It uses a safe type assertion (ok check) on http.DefaultTransport and falls back to a fresh *http.Transport if it's been replaced with a non-standard implementation.

@nickmazzi
Copy link
Copy Markdown
Contributor

/lgtm
/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nickmazzi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
packages/autorag/bff/internal/integrations/s3/client.go (1)

96-105: ⚠️ Potential issue | 🟠 Major

Do not couple DevMode to InsecureSkipVerify.

This is still CWE-295: DEV_MODE=true plus missing or unreadable bundle paths drops certificate verification entirely. In a shared dev namespace or misconfigured CI job, a MITM can impersonate the S3 endpoint and tamper with object traffic. Fail closed here, or gate the bypass behind a separate explicit insecure opt-in instead of DevMode.

Suggested fix
 	} else if c.options.DevMode {
-		// In dev mode without CA bundles, skip TLS verification so developers
-		// can test against clusters with self-signed certificates from their
-		// local machine. In production the operator always provides CA bundles
-		// via --bundle-paths, so this path is never reached.
-		slog.Warn("S3 TLS certificate verification disabled (dev mode, no CA bundles provided)")
-		transport := cloneDefaultTransport()
-		transport.TLSClientConfig = &tls.Config{
-			InsecureSkipVerify: true, //nolint:gosec // dev-mode only fallback
-			MinVersion:         tls.VersionTLS12,
-		}
-		cfg.HTTPClient = &http.Client{
-			Transport: transport,
-			Timeout:   30 * time.Second,
-		}
+		return nil, fmt.Errorf(
+			"no CA bundles loaded for S3 endpoint %q; pass --bundle-paths in dev mode",
+			validatedEndpoint,
+		)
 	}

As per coding guidelines, **/*.go: "No InsecureSkipVerify in TLS configs (enables MITM attacks)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autorag/bff/internal/integrations/s3/client.go` around lines 96 -
105, The code currently ties c.options.DevMode to setting
transport.TLSClientConfig.InsecureSkipVerify=true in the S3 client creation path
(see c.options.DevMode, cloneDefaultTransport, TLSClientConfig and
InsecureSkipVerify), which must be removed; instead fail closed by returning an
error when CA bundle paths are missing/unreadable while not explicitly allowed,
and add a separate explicit insecure opt-in (e.g., c.options.AllowInsecureTLS or
similar) that must be true to set InsecureSkipVerify; update the conditional to
check AllowInsecureTLS before setting InsecureSkipVerify, otherwise
return/propagate an error and log a clear warning without disabling
verification.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/automl/bff/internal/integrations/s3/client_test.go`:
- Around line 32-53: Update the two tests for NewRealS3Client
(TestNewRealS3Client_WithRootCAs and TestNewRealS3Client_DevModeFallback) to
assert the constructed HTTP transport's TLS settings: after calling
NewRealS3Client with S3ClientOptions{RootCAs: pool}, extract the client's
transport/TLSClientConfig and assert TLSClientConfig.RootCAs is the same pool
and TLSClientConfig.InsecureSkipVerify is false; and after calling
NewRealS3Client with S3ClientOptions{DevMode: true}, assert the transport's
TLSClientConfig.InsecureSkipVerify is true (only in the DevMode path). Locate
the transport via the returned client (from NewRealS3Client) or its internal
http.Client/Transport to perform these explicit assertions.

In `@packages/autorag/bff/internal/integrations/s3/client_test.go`:
- Around line 31-37: The test uses a real hostname in S3Credentials.EndpointURL
which causes DNS resolution; update the test to use an IP literal for
EndpointURL (instead of "https://s3.amazonaws.com") so NewRealS3Client's
endpoint validation doesn't hit public DNS. Locate the test cases that construct
S3Credentials in packages/autorag/bff/internal/integrations/s3/client_test.go
(the calls to NewRealS3Client and the S3Credentials struct) and replace the
hostname-based URL with an accepted IP literal URL for both occurrences so the
constructor branches are exercised without external DNS.

In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Around line 92-95: The current code sets cfg.HTTPClient.Timeout which enforces
a full-request timeout and breaks long multipart transfers (see cfg.HTTPClient
and transfer calls like GetObject/UploadObject); remove the Client.Timeout
setting and instead set the equivalent deadline on the transport (e.g.,
transport.ResponseHeaderTimeout = 30 * time.Second) so only header read is
bounded, and rely on per-call contexts passed into the AWS SDK transfer manager
for operation-level deadlines; update the code that builds cfg.HTTPClient to
omit Timeout and ensure transport is configured with ResponseHeaderTimeout.

---

Duplicate comments:
In `@packages/autorag/bff/internal/integrations/s3/client.go`:
- Around line 96-105: The code currently ties c.options.DevMode to setting
transport.TLSClientConfig.InsecureSkipVerify=true in the S3 client creation path
(see c.options.DevMode, cloneDefaultTransport, TLSClientConfig and
InsecureSkipVerify), which must be removed; instead fail closed by returning an
error when CA bundle paths are missing/unreadable while not explicitly allowed,
and add a separate explicit insecure opt-in (e.g., c.options.AllowInsecureTLS or
similar) that must be true to set InsecureSkipVerify; update the conditional to
check AllowInsecureTLS before setting InsecureSkipVerify, otherwise
return/propagate an error and log a clear warning without disabling
verification.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: ae4902cc-90ad-44a8-9bf9-163624a5c903

📥 Commits

Reviewing files that changed from the base of the PR and between 6afcfb3 and 6b6e0ed.

📒 Files selected for processing (6)
  • packages/automl/bff/Makefile
  • packages/automl/bff/internal/integrations/s3/client.go
  • packages/automl/bff/internal/integrations/s3/client_test.go
  • packages/autorag/bff/Makefile
  • packages/autorag/bff/internal/integrations/s3/client.go
  • packages/autorag/bff/internal/integrations/s3/client_test.go
✅ Files skipped from review due to trivial changes (1)
  • packages/automl/bff/Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/automl/bff/internal/integrations/s3/client.go

Comment on lines +32 to +53
func TestNewRealS3Client_WithRootCAs(t *testing.T) {
t.Parallel()
pool := x509.NewCertPool()
_, err := NewRealS3Client(&S3Credentials{
AccessKeyID: "a",
SecretAccessKey: "b",
Region: "us-east-1",
EndpointURL: "https://s3.amazonaws.com",
}, S3ClientOptions{RootCAs: pool})
assert.NoError(t, err)
}

func TestNewRealS3Client_DevModeFallback(t *testing.T) {
t.Parallel()
_, err := NewRealS3Client(&S3Credentials{
AccessKeyID: "a",
SecretAccessKey: "b",
Region: "us-east-1",
EndpointURL: "https://s3.amazonaws.com",
}, S3ClientOptions{DevMode: true})
assert.NoError(t, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Assert TLS wiring explicitly in constructor tests (security regression gap).

Severity: High (test blind spot on cert-validation path).
At Line 41 and Line 52, success-only assertions allow false positives: tests still pass if RootCAs is ignored or if InsecureSkipVerify leaks into non-dev paths. Exploit scenario: a regression could enable MITM acceptance on S3-compatible endpoints (CWE-295), while CI remains green.

Add assertions that verify the constructed transport/TLS config state:

  • RootCAs path: TLSClientConfig.RootCAs == pool and InsecureSkipVerify == false
  • DevMode fallback path (only when no bundles): InsecureSkipVerify == true

As per coding guidelines "HTTP clients to external services must set timeouts and use TLS verification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/automl/bff/internal/integrations/s3/client_test.go` around lines 32
- 53, Update the two tests for NewRealS3Client (TestNewRealS3Client_WithRootCAs
and TestNewRealS3Client_DevModeFallback) to assert the constructed HTTP
transport's TLS settings: after calling NewRealS3Client with
S3ClientOptions{RootCAs: pool}, extract the client's transport/TLSClientConfig
and assert TLSClientConfig.RootCAs is the same pool and
TLSClientConfig.InsecureSkipVerify is false; and after calling NewRealS3Client
with S3ClientOptions{DevMode: true}, assert the transport's
TLSClientConfig.InsecureSkipVerify is true (only in the DevMode path). Locate
the transport via the returned client (from NewRealS3Client) or its internal
http.Client/Transport to perform these explicit assertions.

Comment on lines +31 to +37
_, err := NewRealS3Client(&S3Credentials{
AccessKeyID: "a",
SecretAccessKey: "b",
Region: "us-east-1",
EndpointURL: "https://s3.amazonaws.com",
}, S3ClientOptions{RootCAs: pool})
assert.NoError(t, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep these unit tests off public DNS.

NewRealS3Client resolves hostnames during endpoint validation, so https://s3.amazonaws.com makes these constructor tests depend on external DNS even though they only exercise local constructor branches. Use an accepted IP literal instead.

Suggested fix
 	_, err := NewRealS3Client(&S3Credentials{
 		AccessKeyID:     "a",
 		SecretAccessKey: "b",
 		Region:          "us-east-1",
-		EndpointURL:     "https://s3.amazonaws.com",
+		EndpointURL:     "https://10.0.0.1:9000",
 	}, S3ClientOptions{RootCAs: pool})
@@
 	_, err := NewRealS3Client(&S3Credentials{
 		AccessKeyID:     "a",
 		SecretAccessKey: "b",
 		Region:          "us-east-1",
-		EndpointURL:     "https://s3.amazonaws.com",
+		EndpointURL:     "https://10.0.0.1:9000",
 	}, S3ClientOptions{DevMode: true})

Also applies to: 42-48

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autorag/bff/internal/integrations/s3/client_test.go` around lines 31
- 37, The test uses a real hostname in S3Credentials.EndpointURL which causes
DNS resolution; update the test to use an IP literal for EndpointURL (instead of
"https://s3.amazonaws.com") so NewRealS3Client's endpoint validation doesn't hit
public DNS. Locate the test cases that construct S3Credentials in
packages/autorag/bff/internal/integrations/s3/client_test.go (the calls to
NewRealS3Client and the S3Credentials struct) and replace the hostname-based URL
with an accepted IP literal URL for both occurrences so the constructor branches
are exercised without external DNS.

Comment on lines +92 to +95
cfg.HTTPClient = &http.Client{
Transport: transport,
Timeout: 30 * time.Second,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n packages/autorag/bff/internal/integrations/s3/client.go | head -120

Repository: opendatahub-io/odh-dashboard

Length of output: 5209


🏁 Script executed:

cat -n packages/autorag/bff/internal/integrations/s3/client.go | sed -n '120,200p'

Repository: opendatahub-io/odh-dashboard

Length of output: 3673


Move timeout from http.Client to transport layer; let per-call context carry operation deadlines.

http.Client.Timeout applies to the full request lifetime including response-body reads. This kills multipart GetObject and UploadObject transfers legitimately exceeding 30s (lines 92–95, 107–110). Use transport.ResponseHeaderTimeout to bound header reads only; remove Client.Timeout and rely on the context deadlines passed through the AWS SDK transfer manager.

Suggested fix
 		transport := cloneDefaultTransport()
+		transport.ResponseHeaderTimeout = 30 * time.Second
 		transport.TLSClientConfig = &tls.Config{
 			RootCAs:    c.options.RootCAs,
 			MinVersion: tls.VersionTLS12,
 		}
 		cfg.HTTPClient = &http.Client{
 			Transport: transport,
-			Timeout:   30 * time.Second,
 		}

 		transport := cloneDefaultTransport()
+		transport.ResponseHeaderTimeout = 30 * time.Second
 		transport.TLSClientConfig = &tls.Config{
 			InsecureSkipVerify: true, //nolint:gosec // dev-mode only fallback
 			MinVersion:         tls.VersionTLS12,
 		}
 		cfg.HTTPClient = &http.Client{
 			Transport: transport,
-			Timeout:   30 * time.Second,
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/autorag/bff/internal/integrations/s3/client.go` around lines 92 -
95, The current code sets cfg.HTTPClient.Timeout which enforces a full-request
timeout and breaks long multipart transfers (see cfg.HTTPClient and transfer
calls like GetObject/UploadObject); remove the Client.Timeout setting and
instead set the equivalent deadline on the transport (e.g.,
transport.ResponseHeaderTimeout = 30 * time.Second) so only header read is
bounded, and rely on per-call contexts passed into the AWS SDK transfer manager
for operation-level deadlines; update the code that builds cfg.HTTPClient to
omit Timeout and ensure transport is configured with ResponseHeaderTimeout.

@openshift-merge-bot openshift-merge-bot Bot merged commit 1ffb866 into opendatahub-io:main Apr 14, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants