fix(person): normalize URL trailing slash to prevent urljoin path loss by dtateks · Pull Request #282 · joeyism/linkedin_scraper

dtateks · 2026-03-04T10:31:47Z

Summary

urljoin() drops the username path segment when the input URL lacks a trailing slash, causing 404 errors on all detail page navigations
Fix: normalize linkedin_url to always end with / at the start of PersonScraper.scrape()

Problem

from urllib.parse import urljoin

# WITHOUT trailing slash — BROKEN
urljoin("https://linkedin.com/in/username", "details/patents/")
# → "https://linkedin.com/in/details/patents/"  ← username lost, 404!

# WITH trailing slash — CORRECT
urljoin("https://linkedin.com/in/username/", "details/patents/")
# → "https://linkedin.com/in/username/details/patents/"

This affects all fallback navigations in PersonScraper:

details/experience
details/education
details/interests/
details/{certifications,honors,patents,...}/ (8 accomplishment sections)
overlay/contact-info/

Fix

4-line change at the top of scrape():

if not linkedin_url.endswith("/"):
    linkedin_url = linkedin_url + "/"

All downstream urljoin(base_url, ...) calls then produce correct URLs.

Testing

All 11 unit tests pass
Manually verified URL generation for all affected paths

urljoin('https://linkedin.com/in/user', 'details/patents/') drops the username segment because urljoin treats the last path component without trailing slash as a file. This caused 404s on all detail page navigations (accomplishments, education, interests, contacts) when the input URL lacked a trailing slash. Fix: ensure linkedin_url always ends with '/' at the start of scrape().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(person): normalize URL trailing slash to prevent urljoin path loss#282

fix(person): normalize URL trailing slash to prevent urljoin path loss#282
dtateks wants to merge 1 commit intojoeyism:masterfrom
dtateks:fix/urljoin-trailing-slash

dtateks commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dtateks commented Mar 4, 2026

Summary

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant