Skip to content

fix(pypi): apply full PEP 503 normalization to package names#35

Merged
oritwoen merged 1 commit intomainfrom
fix/pypi-pep503-normalization
Mar 13, 2026
Merged

fix(pypi): apply full PEP 503 normalization to package names#35
oritwoen merged 1 commit intomainfrom
fix/pypi-pep503-normalization

Conversation

@oritwoen
Copy link
Owner

Package name normalization only replaced underscores with hyphens, but PEP 503 specifies collapsing any run of [-_.] into a single hyphen. A package like zope.interface or my--package would pass through with dots and double hyphens intact, which breaks cache key consistency and can cause duplicate entries for the same package.

The regex in both purl.ts and pypi.ts now matches the reference implementation from PEP 503: re.sub(r"[-_.]+", "-", name).lower(). Also normalized dependency names coming out of parsePEP508, so requires_dist entries like zope.interface>=5.0 produce zope-interface instead of the raw dotted form.

Test plan

  • New test: dots, double-dots, double-hyphens, mixed separators (my_-_package) all collapse to single hyphen
  • Full test suite (320 tests) passes
  • tsc --noEmit clean

The old regex only replaced underscores with hyphens, but PEP 503
requires collapsing any run of [-_.] into a single hyphen. Packages
like `my.package` or `my--package` were not normalized correctly,
which could cause cache key mismatches and failed API lookups.

Fixed in both purl.ts (PURL parsing) and pypi.ts (registry calls +
dependency name output from parsePEP508).
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff952d6b-1ae1-46ed-9530-3a24adab81a6

📥 Commits

Reviewing files that changed from the base of the PR and between 715626c and f6c734d.

📒 Files selected for processing (3)
  • src/core/purl.ts
  • src/registries/pypi.ts
  • test/unit/purl.test.ts
📜 Recent review details
🧰 Additional context used
📓 Path-based instructions (8)
src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Keep TypeScript imports with .ts extension style in source files
Do not parse PURLs outside src/core/purl.ts; call createFromPURL or parsePURL instead
Do not duplicate retry/backoff constants outside src/core/client.ts; centralize all retry logic
Do not hardcode cache TTL in random modules; use DEFAULT_TTL from src/cache/lockfile.ts
Respect Node.js runtime floor of >=22.6.0 and modern syntax assumptions

Use .ts import suffixes consistently

Files:

  • src/core/purl.ts
  • src/registries/pypi.ts
src/**/!(client).ts

📄 CodeRabbit inference engine (src/AGENTS.md)

Do not implement fetch/retry behavior outside src/core/client.ts

Files:

  • src/core/purl.ts
  • src/registries/pypi.ts
src/core/**/*.ts

📄 CodeRabbit inference engine (src/core/AGENTS.md)

src/core/**/*.ts: Throw typed errors (InvalidPURLError, NotFoundError, RateLimitError) instead of plain Error in core flows
VersionStatus and Scope are closed unions; keep adapter outputs inside allowed values

Files:

  • src/core/purl.ts
test/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

Use Vitest globals with test/unit and test/e2e split for testing

test/**/*.test.ts: Test files should use *.test.ts naming convention
Vitest globals (describe, it, expect, vi) are enabled and should be used without imports in test files

Files:

  • test/unit/purl.test.ts
test/unit/**/*.test.ts

📄 CodeRabbit inference engine (test/AGENTS.md)

Unit tests should rely on mocks/spies rather than external dependencies

Files:

  • test/unit/purl.test.ts
test/unit/purl.test.ts

📄 CodeRabbit inference engine (test/AGENTS.md)

PURL contract tests should be located in test/unit/purl.test.ts for parse/build validation

Files:

  • test/unit/purl.test.ts
src/registries/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/registries/**/*.ts: Do not bypass Client for direct fetch logic in registries; use the centralized HTTP client
Do not hardcode ecosystem registries using switch logic; use plugin-like registration factories instead

Register ecosystem adapters through factory and side-effect import hub when adding to src/registries/

Files:

  • src/registries/pypi.ts
src/registries/*.ts

📄 CodeRabbit inference engine (src/registries/AGENTS.md)

src/registries/*.ts: Each registry adapter in src/registries/ must expose ecosystem, fetchPackage, fetchVersions, fetchDependencies, fetchMaintainers, and urls exports
Convert source-specific fields into core Package/Version/Dependency/Maintainer shapes before returning from adapter methods
Map remote API failures to core error classes in registry adapters
Do not call fetch directly in registry adapters; use Client instead
Do not return raw upstream payloads through public methods in registry adapters
Keep adapter-to-adapter imports forbidden; each adapter internals must be self-contained
Keep per-registry API quirks isolated to that adapter file before normalizing to shared types

Files:

  • src/registries/pypi.ts
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/commands/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:29.354Z
Learning: Applies to src/commands/**/*.ts : PURL input resolution and optional `pkg:` prefix normalization should be implemented in `src/commands/shared.ts`
📚 Learning: 2026-03-10T07:36:29.354Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/commands/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:29.354Z
Learning: Applies to src/commands/**/*.ts : PURL input resolution and optional `pkg:` prefix normalization should be implemented in `src/commands/shared.ts`

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
  • src/registries/pypi.ts
📚 Learning: 2026-03-10T07:36:12.605Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:12.605Z
Learning: Applies to src/helpers.ts : Wrap `createFromPURL` when adding convenience API in `src/helpers.ts` and preserve normalization path

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:54.862Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:54.862Z
Learning: Applies to test/unit/purl.test.ts : PURL contract tests should be located in `test/unit/purl.test.ts` for parse/build validation

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:03.586Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:03.586Z
Learning: Applies to src/**/*.ts : Do not parse PURLs outside `src/core/purl.ts`; call `createFromPURL` or `parsePURL` instead

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:38.679Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/core/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:38.679Z
Learning: No duplicate PURL parsing utilities outside purl.ts

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:29.354Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/commands/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:29.354Z
Learning: Applies to src/commands/**/*.ts : Do not reimplement PURL parsing in individual command files; use the shared implementation from `src/commands/shared.ts`

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:54.862Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: test/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:54.862Z
Learning: Applies to test/unit/{license,repository}.test.ts : Normalization tests should be located in `test/unit/license.test.ts` and `test/unit/repository.test.ts` for canonical output normalization

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:12.605Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:12.605Z
Learning: Applies to src/{commands,helpers}/**/*.ts : Route all parsing logic through `createFromPURL` instead of duplicating in commands or helpers

Applied to files:

  • src/core/purl.ts
  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:38.679Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/core/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:38.679Z
Learning: PURL parse and build behavior is the single source of truth for validation and normalization

Applied to files:

  • test/unit/purl.test.ts
📚 Learning: 2026-03-10T07:36:46.164Z
Learnt from: CR
Repo: oritwoen/regxa PR: 0
File: src/registries/AGENTS.md:0-0
Timestamp: 2026-03-10T07:36:46.164Z
Learning: Applies to src/registries/*.ts : Convert source-specific fields into core `Package`/`Version`/`Dependency`/`Maintainer` shapes before returning from adapter methods

Applied to files:

  • src/registries/pypi.ts
🧬 Code graph analysis (1)
test/unit/purl.test.ts (1)
src/core/purl.ts (1)
  • parsePURL (22-108)
🔇 Additional comments (3)
src/core/purl.ts (1)

100-100: PEP 503 normalization is correctly applied at parse time.

This prevents split identity paths like my.package vs my_package from diverging later in cache keys and lookups.

src/registries/pypi.ts (1)

228-228: Normalization is now consistent between package and dependency names.

Why this matters: dependency edges now land on the same canonical key format as package fetches, so zope.interface and zope_interface won’t fragment graph entries.

Also applies to: 292-292

test/unit/purl.test.ts (1)

108-114: This test block hits the right failure paths for PEP 503 behavior.

It covers separator runs and dot handling explicitly, which are the exact cases that used to produce inconsistent canonical names.


📝 Walkthrough

Walkthrough

Core and registry layers now normalize PyPI package names per PEP 503 by collapsing any sequence of hyphens, underscores, or dots into a single hyphen. Previously only underscores were normalized. Tests verify the new behavior covers edge cases like consecutive separators.

Changes

Cohort / File(s) Summary
Core normalization logic
src/core/purl.ts, src/registries/pypi.ts
Enhanced PyPI name normalization regex to collapse sequences of -, _, . into single hyphens instead of only handling underscores. Applied consistently across PURL parsing and dependency resolution.
Test coverage
test/unit/purl.test.ts
Added PEP 503 compliance test verifying that variant separators (dots, double-dashes, mixed underscores) normalize to expected forms (e.g., my.packagemy-package).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • aeitwoen

Poem

Hyphens, dots, and underscores—
Once a wild and messy crew,
Now collapse to single dashes,
PEP 503 says we're through! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main change: applying full PEP 503 normalization to PyPI package names instead of partial normalization.
Description check ✅ Passed Description clearly explains the problem (incomplete normalization causing cache key inconsistency), the solution (matching PEP 503 spec with regex), and includes test coverage details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/pypi-pep503-normalization
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/pypi-pep503-normalization
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot requested a review from aeitwoen March 13, 2026 17:58
@codeant-ai
Copy link

codeant-ai bot commented Mar 13, 2026

Sequence Diagram

This PR updates PyPI package normalization to collapse any run of dash, underscore, or dot into a single dash. The diagram shows how both PURL parsing and PyPI dependency handling now produce the same canonical package names, preventing duplicate keys and mismatched lookups.

sequenceDiagram
    participant Client
    participant PURLParser
    participant PyPIRegistry

    Client->>PURLParser: Parse PyPI purl name
    PURLParser->>PURLParser: Normalize with PEP 503 separator collapse
    PURLParser-->>Client: Return canonical package name

    Client->>PyPIRegistry: Build PyPI identifier from package name
    PyPIRegistry->>PyPIRegistry: Normalize with same PEP 503 rule
    PyPIRegistry-->>Client: Return canonical project URL and purl

    Client->>PyPIRegistry: Parse requires dist dependency entry
    PyPIRegistry->>PyPIRegistry: Normalize parsed dependency name
    PyPIRegistry-->>Client: Return canonical dependency metadata
Loading

Generated by CodeAnt AI

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Requires human review: Changes core package name normalization logic for PyPI, which may impact data consistency and requires human verification of its impact on existing indexed data and lookups.

Architecture diagram
sequenceDiagram
    participant C as Client/Scanner
    participant P as PURL Parser
    participant R as PyPI Registry
    participant DB as Cache/Storage

    Note over C,DB: Request Flow for PyPI Package "zope.interface"

    C->>P: parsePURL("pkg:pypi/zope.interface")
    P->>P: CHANGED: Apply full PEP 503 normalization<br/>(collapse runs of [-._] to single hyphen)
    P-->>C: ParsedPURL { name: "zope-interface" }

    C->>R: fetchMetadata("zope-interface")
    
    R->>R: normalizeName("zope-interface")
    R->>DB: Check cache for "zope-interface"
    
    alt Cache Miss
        R->>R: Fetch package JSON from PyPI
        loop For each dependency in requires_dist
            R->>R: NEW: normalizeName(depName) per PEP 503
        end
        R->>DB: Store metadata with normalized dependency names
    end

    DB-->>R: Metadata object
    R-->>C: Normalized Package + Dependencies
Loading

@oritwoen oritwoen self-assigned this Mar 13, 2026
@oritwoen oritwoen merged commit d060acf into main Mar 13, 2026
3 checks passed
@oritwoen oritwoen deleted the fix/pypi-pep503-normalization branch March 13, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant