fix: strip credentials from URLs to prevent leaking auth tokens#78
fix: strip credentials from URLs to prevent leaking auth tokens#78npow wants to merge 4 commits into
Conversation
Mixed conda/pypi environments embed full URLs (including user:token@host credentials) into GCS datastore URIs, cache paths, subprocess args, and on-disk files. This happens because Python's urlparse().netloc preserves embedded credentials, but the code only intended to use the hostname. Changes: - Add strip_url_credentials() and _safe_netloc() helpers to utils.py - Fix make_partial_cache_url to exclude credentials from cache keys - Fix PackageSpecification.__init__ FAKEURL construction - Fix conda_lock_resolver git+URL reconstruction - Add --strip-auth to conda-lock invocation - Fix pip_resolver to strip credentials from -i/--extra-index-url args - Fix is_downloadable_url and is_external_url netloc checks - Add tests for the new helper functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
- Expand urlunparse tuple to separate lines (black) - Fix relative import: from .utils -> from ..utils (resolvers/ -> conda/) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| pypi_sources = sources.get("pypi", []) | ||
| # The first source is always the index | ||
| args.extend(["-i", pypi_sources[0]]) | ||
| # Strip credentials from URLs to avoid leaking them in /proc/<pid>/cmdline |
There was a problem hiding this comment.
I am less concerned about this -- plus it may break things depending on how this is passed in. I would be ok "leaking" to the commandline.
| url = urlparse(base_url) | ||
|
|
||
| if is_real_url or url.netloc.split("/")[0] == FAKEURL_PATHCOMPONENT: | ||
| # Use hostname (not netloc) to avoid leaking embedded credentials |
There was a problem hiding this comment.
why not use the _safe_netloc function?
| # already there. | ||
| url_parse_result = urlparse(self._url) | ||
| if not url_parse_result.netloc.startswith(FAKEURL_PATHCOMPONENT): | ||
| # Use hostname (not netloc) to avoid leaking embedded credentials |
| return pkg_format == self._url_format and not urlparse( | ||
| self._url | ||
| ).netloc.startswith(FAKEURL_PATHCOMPONENT) | ||
| url_parsed = urlparse(self._url) |
- Revert pip_resolver.py strip_url_credentials (maintainer OK with CLI credential exposure, stripping may break auth) - Use _safe_netloc() helper instead of inline computation in env_descr.py at 3 call sites (make_partial_cache_url, PackageSpecification.__init__, is_downloadable_url) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
||
| # Use hostname (not netloc) to avoid leaking embedded credentials | ||
| safe_netloc = url.hostname or "" | ||
| if url.port: |
There was a problem hiding this comment.
why did you get rid of this?
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Superseded by #83 — reopened from a same-repo branch so GitHub Actions can publish test results (fork PRs hit |
Summary
Mixed conda/pypi environments embed full URLs (including
user:token@hostcredentials) into GCS datastore URIs, cache paths, subprocess args, and on-disk files. This happens because Python'surlparse().netlocpreserves embedded credentials, but the code only intended to use the hostname.Root cause:
urlparse("https://user:token@host/path").netlocreturns"user:token@host"— any code using.netlocwhere.hostnamewas intended silently propagates credentials into storage paths, cache keys, command-line arguments, and reconstructed URLs.Changes
New helpers (
utils.py)strip_url_credentials(url)— returns the URL withuser:password@removed, preserving scheme, host, port, path, query, fragment_safe_netloc(parsed)— returnshostname:portwithout credentials from aurlparseresultFixes (4 files, 8 sites)
env_descr.py—make_partial_cache_urlurl.netlocwith_safe_netloc(url)in both branchesenv_descr.py—PackageSpecification.__init__url_parse_result.netlocwith_safe_netloc()in FAKEURL constructionenv_descr.py—is_downloadable_url.netloc.startswith()with_safe_netloc().startswith()env_descr.py—is_external_urlurlparse(source).netlocwithurlparse(source).hostname or ""conda_lock_resolver.py— git+URL reconstructionparse.netlocwith_safe_netloc(parse)conda_lock_resolver.py— conda-lock invocation--strip-authflagpip_resolver.py— subprocess argsstrip_url_credentials()before passing as-i/--extra-index-urlTests
strip_url_credentials()and_safe_netloc()helpersCache migration note
Fixing
make_partial_cache_urlchanges cache key structure fromuser:token@host/pathtohost/path. This causes a one-time cache miss for environments that previously had credentials embedded in cache keys. Subsequent runs will populate the new (correct) cache paths.Test plan
conda-lockoutput does not contain credentialspip installsubprocess args do not contain credentials🤖 Generated with Claude Code