Skip to content

fix: strip credentials from URLs to prevent leaking auth tokens#78

Closed
npow wants to merge 4 commits into
Netflix:mainfrom
npow:fix/strip-url-credentials
Closed

fix: strip credentials from URLs to prevent leaking auth tokens#78
npow wants to merge 4 commits into
Netflix:mainfrom
npow:fix/strip-url-credentials

Conversation

@npow

@npow npow commented Apr 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Mixed conda/pypi environments embed full URLs (including user:token@host credentials) into GCS datastore URIs, cache paths, subprocess args, and on-disk files. This happens because Python's urlparse().netloc preserves embedded credentials, but the code only intended to use the hostname.

Root cause: urlparse("https://user:token@host/path").netloc returns "user:token@host" — any code using .netloc where .hostname was intended silently propagates credentials into storage paths, cache keys, command-line arguments, and reconstructed URLs.

Changes

New helpers (utils.py)

  • strip_url_credentials(url) — returns the URL with user:password@ removed, preserving scheme, host, port, path, query, fragment
  • _safe_netloc(parsed) — returns hostname:port without credentials from a urlparse result

Fixes (4 files, 8 sites)

File Fix Severity
env_descr.pymake_partial_cache_url Replace url.netloc with _safe_netloc(url) in both branches CRITICAL
env_descr.pyPackageSpecification.__init__ Replace url_parse_result.netloc with _safe_netloc() in FAKEURL construction CRITICAL
env_descr.pyis_downloadable_url Replace .netloc.startswith() with _safe_netloc().startswith() LOW
env_descr.pyis_external_url Replace urlparse(source).netloc with urlparse(source).hostname or "" LOW
conda_lock_resolver.py — git+URL reconstruction Replace parse.netloc with _safe_netloc(parse) HIGH
conda_lock_resolver.py — conda-lock invocation Add --strip-auth flag HIGH
pip_resolver.py — subprocess args Wrap pypi source URLs with strip_url_credentials() before passing as -i/--extra-index-url HIGH

Tests

  • 11 test cases covering strip_url_credentials() and _safe_netloc() helpers

Cache migration note

Fixing make_partial_cache_url changes cache key structure from user:token@host/path to host/path. This causes a one-time cache miss for environments that previously had credentials embedded in cache keys. Subsequent runs will populate the new (correct) cache paths.

Test plan

  • Verify existing conda/pypi environment resolution tests pass
  • Test mixed conda+pypi environment with authenticated private PyPI
  • Verify cache keys no longer contain credentials
  • Verify conda-lock output does not contain credentials
  • Verify pip install subprocess args do not contain credentials

🤖 Generated with Claude Code

Mixed conda/pypi environments embed full URLs (including user:token@host
credentials) into GCS datastore URIs, cache paths, subprocess args, and
on-disk files. This happens because Python's urlparse().netloc preserves
embedded credentials, but the code only intended to use the hostname.

Changes:
- Add strip_url_credentials() and _safe_netloc() helpers to utils.py
- Fix make_partial_cache_url to exclude credentials from cache keys
- Fix PackageSpecification.__init__ FAKEURL construction
- Fix conda_lock_resolver git+URL reconstruction
- Add --strip-auth to conda-lock invocation
- Fix pip_resolver to strip credentials from -i/--extra-index-url args
- Fix is_downloadable_url and is_external_url netloc checks
- Add tests for the new helper functions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov

codecov Bot commented Apr 30, 2026

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

- Expand urlunparse tuple to separate lines (black)
- Fix relative import: from .utils -> from ..utils (resolvers/ -> conda/)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@romain-intel romain-intel left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

pypi_sources = sources.get("pypi", [])
# The first source is always the index
args.extend(["-i", pypi_sources[0]])
# Strip credentials from URLs to avoid leaking them in /proc/<pid>/cmdline

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am less concerned about this -- plus it may break things depending on how this is passed in. I would be ok "leaking" to the commandline.

url = urlparse(base_url)

if is_real_url or url.netloc.split("/")[0] == FAKEURL_PATHCOMPONENT:
# Use hostname (not netloc) to avoid leaking embedded credentials

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the _safe_netloc function?

# already there.
url_parse_result = urlparse(self._url)
if not url_parse_result.netloc.startswith(FAKEURL_PATHCOMPONENT):
# Use hostname (not netloc) to avoid leaking embedded credentials

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

return pkg_format == self._url_format and not urlparse(
self._url
).netloc.startswith(FAKEURL_PATHCOMPONENT)
url_parsed = urlparse(self._url)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

- Revert pip_resolver.py strip_url_credentials (maintainer OK with
  CLI credential exposure, stripping may break auth)
- Use _safe_netloc() helper instead of inline computation in
  env_descr.py at 3 call sites (make_partial_cache_url,
  PackageSpecification.__init__, is_downloadable_url)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Use hostname (not netloc) to avoid leaking embedded credentials
safe_netloc = url.hostname or ""
if url.port:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you get rid of this?

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@npow

npow commented May 12, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #83 — reopened from a same-repo branch so GitHub Actions can publish test results (fork PRs hit dorny/test-reporter HttpError due to read-only GITHUB_TOKEN). All review feedback from this PR has been carried over.

@npow npow closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants