fix(security): validate embedding base URLs to prevent SSRF#1221
fix(security): validate embedding base URLs to prevent SSRF#1221
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical security vulnerability by introducing robust validation for user-configurable base URLs. By preventing malicious or misconfigured URLs from being used, it significantly mitigates the risk of SSRF attacks, protecting against potential credential leakage, access to internal services, and local file reads. The changes ensure that the application's network requests are restricted to safe and intended destinations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an important security fix to validate base URLs and prevent SSRF attacks. The implementation in validate_base_url is a good step forward. However, I've identified a critical flaw where it fails to resolve hostnames, allowing for DNS-based SSRF bypasses. I've left a detailed comment with a suggested fix for this. Additionally, the validation for NEARAI_BASE_URL seems to be missing from this PR, which was mentioned as one of the vulnerable configurations. Please ensure all user-configurable base URLs are validated to fully mitigate the risk.
src/config/helpers.rs
Outdated
| // For HTTPS, reject private/loopback/link-local/metadata IP literals. | ||
| if let Ok(ip) = host.parse::<IpAddr>() { | ||
| let is_dangerous = match ip { | ||
| IpAddr::V4(v4) => { | ||
| v4.is_private() | ||
| || v4.is_loopback() | ||
| || v4.is_link_local() | ||
| || v4.is_multicast() | ||
| || v4.is_unspecified() | ||
| || v4 == Ipv4Addr::new(169, 254, 169, 254) | ||
| || (v4.octets()[0] == 100 && (v4.octets()[1] & 0xC0) == 64) // CGN | ||
| } | ||
| IpAddr::V6(v6) => { | ||
| if let Some(v4) = v6.to_ipv4_mapped() { | ||
| v4.is_private() | ||
| || v4.is_loopback() | ||
| || v4.is_link_local() | ||
| || v4 == Ipv4Addr::new(169, 254, 169, 254) | ||
| } else { | ||
| v6.is_loopback() || v6.is_unspecified() | ||
| } | ||
| } | ||
| }; | ||
| if is_dangerous { | ||
| return Err(ConfigError::InvalidValue { | ||
| key: field_name.to_string(), | ||
| message: format!( | ||
| "HTTPS URL points to a private/internal IP '{}'. \ | ||
| This is blocked to prevent SSRF attacks.", | ||
| ip | ||
| ), | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
The current validation has two main issues:
- Critical SSRF Bypass: It only checks if the host part of the URL is an IP literal corresponding to a private/internal address. It does not handle hostnames that resolve to such IPs. An attacker can bypass this check by using a domain name that points to an internal IP (e.g.,
169.254.169.254), leaving a critical SSRF vulnerability open. - Incomplete IPv6 Validation: The IPv6 validation is missing a check for Unique Local Addresses (ULA), which fall in the
fc00::/7range. These are private network addresses and should also be blocked.
To fix this, you must resolve hostnames to their IP address(es) and perform validation on each resolved IP. The IPv6 ULA check should also be added. You will also need to add tests for this new logic, including cases where a hostname resolves to a blocked IP.
While this is a significant improvement, be aware that this still leaves a potential Time-of-check to time-of-use (TOCTOU) vulnerability. A fully robust solution would involve pinning the resolved IP for the subsequent HTTP request, but that may require more extensive changes.
Here's a suggested implementation that addresses both points:
// For HTTPS, reject private/loopback/link-local/metadata IP literals.
// This check must be performed on the resolved IP address(es) for hostnames, not just IP literals.
let check_ip = |ip: IpAddr, field_name: &str| -> Result<(), ConfigError> {
let is_dangerous = match ip {
IpAddr::V4(v4) => {
v4.is_private()
|| v4.is_loopback()
|| v4.is_link_local()
|| v4.is_multicast()
|| v4.is_unspecified()
|| v4 == Ipv4Addr::new(169, 254, 169, 254)
|| (v4.octets()[0] == 100 && (v4.octets()[1] & 0xC0) == 64) // CGN
}
IpAddr::V6(v6) => {
if let Some(v4) = v6.to_ipv4_mapped() {
v4.is_private()
|| v4.is_loopback()
|| v4.is_link_local()
|| v4 == Ipv4Addr::new(169, 254, 169, 254)
} else {
// Check for loopback, unspecified, and Unique Local Addresses (ULA, fc00::/7)
v6.is_loopback() || v6.is_unspecified() || (v6.octets()[0] & 0xfe) == 0xfc
}
}
};
if is_dangerous {
Err(ConfigError::InvalidValue {
key: field_name.to_string(),
message: format!(
"URL host resolves to a private/internal IP '{}', which is blocked to prevent SSRF attacks.",
ip
),
})
} else {
Ok(())
}
};
if let Ok(ip) = host.parse::<IpAddr>() {
// The host is an IP literal.
check_ip(ip, field_name)?;
} else {
// The host is a domain name. Resolve it and check all resulting IPs.
use std::net::ToSocketAddrs;
let port = parsed.port().unwrap_or(443);
let addrs = match (host, port).to_socket_addrs() {
Ok(addrs) => addrs,
Err(e) => {
return Err(ConfigError::InvalidValue {
key: field_name.to_string(),
message: format!("failed to resolve host '{}': {}", host, e),
});
}
};
let resolved_ips: Vec<IpAddr> = addrs.map(|addr| addr.ip()).collect();
if resolved_ips.is_empty() {
return Err(ConfigError::InvalidValue {
key: field_name.to_string(),
message: format!("host '{}' did not resolve to any IP addresses", host),
});
}
for ip in resolved_ips {
check_ip(ip, field_name)?;
}
}References
- To prevent DNS-based SSRF vulnerabilities, resolve hostnames to IP addresses, validate all resolved IPs against restricted ranges, and pin the validated addresses for the connection to prevent TOCTOU race conditions.
There was a problem hiding this comment.
Addressed in 57d300c. The key changes from Gemini's findings:
-
DNS resolution now rejects on failure (fail-closed) -- hostnames that don't resolve at config time are rejected rather than silently allowed through. This prevents the scenario where a hostname resolves to a private IP later.
-
DNS rebinding comment softened -- the comment now accurately describes this as defense-in-depth, acknowledging that full DNS rebinding protection would require IP pinning in the HTTP connector.
-
IPv6 link-local and multicast now blocked -- the
is_dangerous_ipclosure now checksfe80::/10andff00::/8for native IPv6, and also checks multicast/unspecified/CGN for IPv4-mapped IPv6 addresses.
All of these have corresponding regression tests.
ilblackdragon
left a comment
There was a problem hiding this comment.
Actually let's fix gemini findings
|
Addressed all review feedback in 2db1db3:
Re: TOCTOU — acknowledged in the code comment. Full mitigation (IP pinning for the actual request) would require plumbing through to the embedding providers, which is a larger refactor. The hostname-based check at config time is a significant improvement over zero validation. |
|
All feedback addressed: DNS resolution check, ULA blocking, NEARAI_BASE_URL validation, and formatting fix. Ready for re-review @ilblackdragon. |
ilblackdragon
left a comment
There was a problem hiding this comment.
Review: fix(security): validate embedding base URLs to prevent SSRF
The good
The validate_base_url() function in src/config/helpers.rs is well-written. It correctly handles:
- Scheme validation (rejects
file://,ftp://, etc.) - HTTP-localhost-only restriction (prevents credential leakage over plaintext)
- Private/loopback/link-local/multicast/unspecified IPv4 blocking
- Cloud metadata endpoint (
169.254.169.254) - CGN range (
100.64.0.0/10) - IPv4-mapped IPv6 addresses (
::ffff:x.x.x.x) - ULA (
fc00::/7) - DNS resolution of hostnames with graceful fallthrough on DNS failure
The TOCTOU concern (DNS resolves to a safe IP at config time but could resolve differently at request time) is acknowledged and acceptable as defense-in-depth.
Main issue: incomplete coverage
Per .claude/rules/review-discipline.md: "Fix the pattern, not just the instance."
The PR wires validate_base_url() into only 3 of ~10 user-controlled base URLs that ultimately make HTTP requests. The unvalidated URLs carry the same SSRF risk.
Validated by this PR (3):
| Env var | Location |
|---|---|
OLLAMA_BASE_URL |
src/config/embeddings.rs:83 (PR diff) |
EMBEDDING_BASE_URL |
src/config/embeddings.rs:85 (PR diff) |
NEARAI_BASE_URL |
src/config/llm.rs:100 (PR diff) |
NOT validated (~7+):
| Env var | Where it flows | Location |
|---|---|---|
OPENAI_BASE_URL |
resolve_registry_provider() -> RegistryProviderConfig.base_url |
src/config/llm.rs:263 via providers.json line 10 |
ANTHROPIC_BASE_URL |
same path | src/config/llm.rs:263 via providers.json line 31 |
LLM_BASE_URL |
same path (openai_compatible fallback) | src/config/llm.rs:233 / src/config/llm.rs:263 |
MINIMAX_BASE_URL |
same path | src/config/llm.rs:263 via providers.json line 394 |
CLOUDFLARE_BASE_URL |
same path | src/config/llm.rs:263 via providers.json line 415 |
TRANSCRIPTION_BASE_URL |
TranscriptionConfig::resolve() -> OpenAiWhisperProvider |
src/config/transcription.rs:48 |
NEARAI_AUTH_URL |
SessionConfig.auth_base_url -> HTTP requests in src/llm/session.rs:174 |
src/config/llm.rs:84 |
All five registry-based base_url_env vars flow through the same resolve_registry_provider() method (src/config/llm.rs:191-346), which resolves the URL at line 263 but never calls validate_base_url(). Additionally, any user-defined provider in ~/.ironclaw/providers.json can specify an arbitrary base_url_env, and those also flow through the same unvalidated path.
Suggested fix
-
In
resolve_registry_provider()(src/config/llm.rs): add a singlevalidate_base_url(&base_url, base_url_env.unwrap_or("LLM_BASE_URL"))?;call after the base URL is resolved (after line 277), before returning. This covers all current and future registry providers in one shot. -
In
TranscriptionConfig::resolve()(src/config/transcription.rs): add validation after line 48:if let Some(ref url) = base_url { validate_base_url(url, "TRANSCRIPTION_BASE_URL")?; }
-
In
LlmConfig::resolve()forNEARAI_AUTH_URL(src/config/llm.rs:84): validate the resolvedauth_base_urlbefore storing it inSessionConfig. This URL is used for HTTP requests insrc/llm/session.rs:174.
Test gaps
The tests added in src/config/helpers.rs cover the primary cases well, but are missing a few important edge cases:
-
IPv4-mapped IPv6: No test for
https://[::ffff:169.254.169.254]orhttps://[::ffff:10.0.0.1]-- the code handles this (theto_ipv4_mapped()branch), but there's no regression test proving it. -
ULA addresses: No test for
https://[fd00::1]-- thefc00::/7check in the code should catch this, but it's untested. -
CGN range: No test for
https://100.64.0.1-- the CGN check (v4.octets()[0] == 100 && (v4.octets()[1] & 0xC0) == 64) is untested. -
Unspecified address: No test for
https://0.0.0.0-- thev4.is_unspecified()check is untested. -
IPv6 loopback: No test for
https://[::1]as an HTTPS URL (only tested as HTTP localhost, which takes the early-return path and never reaches theis_dangerous_ipclosure).
These are all cases the code already handles, so the tests would just be confirming the existing logic -- but they guard against regressions if someone refactors the IP checking.
Summary
The core validation function is solid. The ask is to wire it into the remaining base URL resolution paths -- particularly resolve_registry_provider() which is the single chokepoint for all registry-based LLM providers. That one change would cover OpenAI, Anthropic, Ollama-as-LLM-backend, MiniMax, Cloudflare, openai_compatible, and all future registry additions.
|
Thanks for the thorough review @ilblackdragon. Agreed on all points. Incomplete coverage: Will add Test gaps: Will add regression tests for:
DNS-based SSRF: The code already performs DNS resolution of hostnames and validates all resolved IPs — as your own review noted under "The good" section (DNS resolution of hostnames with graceful fallthrough on DNS failure). The TOCTOU caveat is acknowledged as acceptable defense-in-depth. Gemini's suggestion is essentially describing what the code already does. |
Review feedback addressedAll items from @ilblackdragon's review have been resolved:
All 14 helpers tests pass, fmt clean, clippy zero warnings. |
User-configurable base URLs (OLLAMA_BASE_URL, EMBEDDING_BASE_URL) were passed directly to reqwest with no validation, allowing SSRF attacks against cloud metadata endpoints, internal services, or file:// URIs. Adds validate_base_url() that rejects: - Non-HTTP(S) schemes (file://, ftp://) - HTTP to non-localhost destinations (prevents credential leakage) - HTTPS to private/loopback/link-local/metadata IPs (169.254.169.254, 10.x, 192.168.x, 172.16-31.x, CGN 100.64/10) - IPv4-mapped IPv6 bypass attempts Validation runs at config resolution time so bad URLs fail at startup. Closes #1103 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…E_URL validation Address review feedback: - Resolve hostnames to IPs and check all resolved addresses against the blocklist (prevents DNS-based SSRF bypass where attacker uses a domain pointing to 169.254.169.254) - Add IPv6 Unique Local Address (fc00::/7) to the blocklist - Validate NEARAI_BASE_URL in llm config (was missing — especially dangerous since bearer tokens are forwarded to the configured URL) - Allow DNS resolution failure gracefully (don't block startup when DNS is temporarily unavailable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add validate_base_url() in resolve_registry_provider() covering all LLM providers (OpenAI, Anthropic, Ollama, openai_compatible, etc.) - Add validate_base_url() for NEARAI_AUTH_URL in LlmConfig::resolve() - Add validate_base_url() for TRANSCRIPTION_BASE_URL in TranscriptionConfig - Add missing SSRF test cases: CGN range, IPv4-mapped IPv6, ULA IPv6, URLs with credentials, empty/invalid URLs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b91f650 to
38172f1
Compare
There was a problem hiding this comment.
Pull request overview
Adds config-time validation for user-configurable “base URL” settings to mitigate SSRF risks when the server makes outbound HTTP requests (issue #1103).
Changes:
- Introduces
validate_base_url()insrc/config/helpers.rswith unit tests for allowed/blocked URL patterns. - Applies base-URL validation to LLM-related URLs (
NEARAI_AUTH_URL,NEARAI_BASE_URL, and genericLLM_BASE_URL), embeddings URLs (OLLAMA_BASE_URL,EMBEDDING_BASE_URL), and transcription URL (TRANSCRIPTION_BASE_URL). - Ensures invalid URLs fail early during config resolution (startup/config load time).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/config/helpers.rs |
Adds validate_base_url() and SSRF-focused tests. |
src/config/llm.rs |
Validates NearAI and generic provider base URLs during config resolution. |
src/config/embeddings.rs |
Validates Ollama and embedding provider base URLs during config resolution. |
src/config/transcription.rs |
Validates optional transcription base URL override during config resolution. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/config/helpers.rs
Outdated
| || v4.is_link_local() | ||
| || v4 == Ipv4Addr::new(169, 254, 169, 254) | ||
| } else { | ||
| v6.is_loopback() || v6.is_unspecified() || (v6.octets()[0] & 0xfe) == 0xfc // ULA (fc00::/7) |
There was a problem hiding this comment.
Fixed. The is_dangerous_ip closure now blocks IPv6 link-local (fe80::/10 via segments()[0] & 0xffc0 == 0xfe80) and multicast (ff00::/8 via octets()[0] == 0xff). Also extended the IPv4-mapped IPv6 branch to check multicast, unspecified, and CGN ranges for completeness.
Added regression tests: validate_base_url_rejects_ipv6_link_local and validate_base_url_rejects_ipv6_multicast.
| // For HTTPS, reject private/loopback/link-local/metadata IPs. | ||
| // Check both IP literals and resolved hostnames to prevent DNS-based SSRF. | ||
| if let Ok(ip) = host.parse::<IpAddr>() { |
There was a problem hiding this comment.
Fixed. The comment now reads: "Also resolve hostnames and check the resulting IPs as defense-in-depth. NOTE: This does NOT fully prevent DNS rebinding attacks (the hostname could resolve to a different IP at request time). Full protection would require pinning the resolved IP in the HTTP client's connector. This validation catches the common case of misconfigured or malicious URLs."
src/config/helpers.rs
Outdated
| if let Ok(addrs) = (host, port).to_socket_addrs() { | ||
| for addr in addrs { | ||
| if is_dangerous_ip(&addr.ip()) { | ||
| return Err(ConfigError::InvalidValue { | ||
| key: field_name.to_string(), | ||
| message: format!( | ||
| "hostname '{}' resolves to private/internal IP '{}'. \ | ||
| This is blocked to prevent SSRF attacks.", | ||
| host, | ||
| addr.ip() | ||
| ), | ||
| }); | ||
| } | ||
| } | ||
| } | ||
| // If DNS resolution fails, allow it — the HTTP request will fail | ||
| // later with a clear connection error. We don't want to block | ||
| // startup when DNS is temporarily unavailable. |
There was a problem hiding this comment.
Fixed. DNS resolution failures now return an error instead of allowing through. The code now uses match on the result and returns ConfigError::InvalidValue with a message like "failed to resolve hostname '...': ... Base URLs must be resolvable at config time."
Added regression test: validate_base_url_rejects_dns_failure using .invalid TLD.
src/config/helpers.rs
Outdated
| // Hostname — resolve and check all resulting IPs. | ||
| use std::net::ToSocketAddrs; | ||
| let port = parsed.port().unwrap_or(443); | ||
| if let Ok(addrs) = (host, port).to_socket_addrs() { | ||
| for addr in addrs { |
There was a problem hiding this comment.
Acknowledged with an inline comment. The code now has this note:
NOTE:
to_socket_addrs()performs blocking DNS resolution. This is acceptable becausevalidate_base_urlruns at config-load time only, before the async runtime is fully driving I/O. If this ever moves to a hot path, wrap intokio::task::spawn_blockingor usetokio::net::lookup_host.
Since config resolution happens at startup before any I/O-heavy async work, the blocking call is acceptable. If this validation ever needs to run on a hot path, it should be migrated to async DNS.
src/config/helpers.rs
Outdated
|
|
||
| let parsed = reqwest::Url::parse(url).map_err(|e| ConfigError::InvalidValue { | ||
| key: field_name.to_string(), | ||
| message: format!("invalid URL '{}': {}", url, e), |
There was a problem hiding this comment.
Fixed. Added a redact_url() helper that strips userinfo (username:password) from URLs before including them in error messages. All error paths in validate_base_url now use the redacted URL.
Added regression tests: validate_base_url_credentials_redacted_in_errors verifies passwords don't appear in error output, and redact_url_strips_credentials / redact_url_no_credentials_unchanged test the helper directly.
src/config/llm.rs
Outdated
| let field = base_url_env.unwrap_or("LLM_BASE_URL"); | ||
| validate_base_url(&base_url, field)?; |
There was a problem hiding this comment.
Fixed. When base_url_env is None, the field label now uses {canonical_id}.base_url (e.g., openai.base_url, groq.base_url) instead of the misleading LLM_BASE_URL. This correctly identifies the source of the problem when the URL came from a registry default or settings rather than an env var.
- Add IPv6 link-local (fe80::/10) and multicast (ff00::/8) blocking
- Complete IPv4-mapped IPv6 checks (multicast, unspecified, CGN)
- Fail-closed on DNS resolution failure instead of allowing through
- Soften DNS rebinding comment to accurately describe defense-in-depth
- Add comment about blocking DNS being acceptable at config-load time
- Redact userinfo from URLs in error messages to prevent credential leaks
- Fix field_name label to use `{provider}.base_url` when env var is None
- Update test URLs from example.com to real resolvable hostnames
- Add tests: unspecified address, IPv6 loopback HTTPS, link-local, multicast,
ULA fd prefix, DNS failure rejection, credential redaction
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback addressed (57d300c)All review comments from ilblackdragon, Gemini, and Copilot have been addressed: Coverage (ilblackdragon's main issue)The previous commit (38172f1) already wired
This commit addresses the remaining review feedback on the validation function itself. Security improvements
Code quality
New tests (10 added)
All 3199 lib tests pass, zero clippy warnings. |
Summary
Fixes SSRF risk via user-configurable embedding base URLs (#1103).
The vulnerability:
OLLAMA_BASE_URL,EMBEDDING_BASE_URL, andNEARAI_BASE_URLwere passed directly toreqwestHTTP calls with zero validation. An attacker controlling these values (via env vars orsettings.json) could force the server to make requests to:http://169.254.169.254-- AWS/GCP/Azure IMDS, leaking IAM credentialshttp://10.x.x.x-- internal services on private networksfile:///etc/passwd-- local file readsThe fix: Adds
validate_base_url()insrc/config/helpers.rsthat rejects:file://,ftp://)::ffff:169.254.169.254)Validation runs at config resolution time so bad URLs are rejected at startup, not at first use.
Test plan
cargo clippy --all --all-features-- zero warningshttp://localhost:11434for Ollama passes validation)Closes #1103