Summary
validate_url resolves the hostname via getaddrinfo and validates the IP, then safe_fetch (and the redirect handler) re-resolve through urllib. An attacker-controlled DNS server can return a public IP for the validation lookup and a private IP (e.g. 127.0.0.1, 169.254.169.254) for the actual connection — bypassing the metadata-endpoint guard.
Where
graphify/security.py:51-62 — validate_url: getaddrinfo, validate, return.
graphify/security.py:107-128 — safe_fetch: urlopen reopens the URL by hostname.
graphify/security.py:74-76 — _NoFileRedirectHandler.redirect_request: re-runs validate_url on the redirect target, same TOCTOU.
Additionally, socket.gaierror is silently swallowed in validate_url (lines 61-62) — DNS failure means no IP check occurred, but the fetch still proceeds. Worth tightening that branch independently.
Impact
SECURITY.md claims metadata endpoints are blocked. The claim doesn't hold against a controlled-DNS attacker. Risk depends on context:
- Low for a solo operator pointing graphify at their own files.
- Higher if anyone wraps
graphify add <url> as a service or runs it on URLs from untrusted input (e.g. enriching crawled documents).
Suggested fix shape
Connect to the resolved IP rather than the hostname:
getaddrinfo once in validate_url and return the (sockaddr, original_host).
- Open the connection against the IP and pass the original host as the
Host: header / SNI.
Equivalent recipes ship with most SSRF-hardening guides; happy to put up a PR if you'd like to discuss the shape first.
Related
This was found alongside the issues fixed in #589 (cache race + clone arg injection), but it needs design discussion so I'm filing separately.
Summary
validate_urlresolves the hostname viagetaddrinfoand validates the IP, thensafe_fetch(and the redirect handler) re-resolve throughurllib. An attacker-controlled DNS server can return a public IP for the validation lookup and a private IP (e.g.127.0.0.1,169.254.169.254) for the actual connection — bypassing the metadata-endpoint guard.Where
graphify/security.py:51-62—validate_url:getaddrinfo, validate, return.graphify/security.py:107-128—safe_fetch:urlopenreopens the URL by hostname.graphify/security.py:74-76—_NoFileRedirectHandler.redirect_request: re-runsvalidate_urlon the redirect target, same TOCTOU.Additionally,
socket.gaierroris silently swallowed invalidate_url(lines 61-62) — DNS failure means no IP check occurred, but the fetch still proceeds. Worth tightening that branch independently.Impact
SECURITY.mdclaims metadata endpoints are blocked. The claim doesn't hold against a controlled-DNS attacker. Risk depends on context:graphify add <url>as a service or runs it on URLs from untrusted input (e.g. enriching crawled documents).Suggested fix shape
Connect to the resolved IP rather than the hostname:
getaddrinfoonce invalidate_urland return the (sockaddr, original_host).Host:header / SNI.Equivalent recipes ship with most SSRF-hardening guides; happy to put up a PR if you'd like to discuss the shape first.
Related
This was found alongside the issues fixed in #589 (cache race + clone arg injection), but it needs design discussion so I'm filing separately.