Skip to content

fix(person): normalize URL trailing slash to prevent urljoin path loss#282

Open
dtateks wants to merge 1 commit intojoeyism:masterfrom
dtateks:fix/urljoin-trailing-slash
Open

fix(person): normalize URL trailing slash to prevent urljoin path loss#282
dtateks wants to merge 1 commit intojoeyism:masterfrom
dtateks:fix/urljoin-trailing-slash

Conversation

@dtateks
Copy link

@dtateks dtateks commented Mar 4, 2026

Summary

  • urljoin() drops the username path segment when the input URL lacks a trailing slash, causing 404 errors on all detail page navigations
  • Fix: normalize linkedin_url to always end with / at the start of PersonScraper.scrape()

Problem

from urllib.parse import urljoin

# WITHOUT trailing slash — BROKEN
urljoin("https://linkedin.com/in/username", "details/patents/")
# → "https://linkedin.com/in/details/patents/"  ← username lost, 404!

# WITH trailing slash — CORRECT
urljoin("https://linkedin.com/in/username/", "details/patents/")
# → "https://linkedin.com/in/username/details/patents/"

This affects all fallback navigations in PersonScraper:

  • details/experience
  • details/education
  • details/interests/
  • details/{certifications,honors,patents,...}/ (8 accomplishment sections)
  • overlay/contact-info/

Fix

4-line change at the top of scrape():

if not linkedin_url.endswith("/"):
    linkedin_url = linkedin_url + "/"

All downstream urljoin(base_url, ...) calls then produce correct URLs.

Testing

  • All 11 unit tests pass
  • Manually verified URL generation for all affected paths

urljoin('https://linkedin.com/in/user', 'details/patents/') drops the
username segment because urljoin treats the last path component without
trailing slash as a file. This caused 404s on all detail page navigations
(accomplishments, education, interests, contacts) when the input URL
lacked a trailing slash.

Fix: ensure linkedin_url always ends with '/' at the start of scrape().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant