feat/patchright & persistent context by stickerdaniel · Pull Request #280 · joeyism/linkedin_scraper

stickerdaniel · 2026-02-12T20:23:01Z

Implementation of proposed enhancements in #279. Switch from Playwright to Patchright and replacing the manual session handling with Playwright's persistent context

text_content() captures invisible React RSC serialized JSON that LinkedIn now embeds on every page, containing "try again later" as a preloaded error template. This causes false positive rate limit detection on every scrape. inner_text() returns only visible text, matching the pattern used throughout the rest of the codebase. Resolves: joeyism#277 See also: joeyism#275

Replace playwright with patchright (anti-detection fork) and rewrite BrowserManager to use launch_persistent_context(user_data_dir=...). BREAKING CHANGE: BrowserManager now requires user_data_dir parameter. save_session() and load_session() methods removed - state auto-persists.

This repo uses requirements.txt, not uv.

Copilot

Pull request overview

This PR implements a major architectural change to improve LinkedIn session persistence and evade bot detection by:

Switching from Playwright to Patchright - A drop-in replacement that removes automation signals (navigator.webdriver, HeadlessChrome UA, CDP leaks) to avoid detection
Replacing manual session handling with persistent browser context - Instead of saving/loading cookies+localStorage to JSON files, the entire Chromium profile (cache, service workers, history, cookies, etc.) is now stored in a persistent directory
Updating Python version requirement - Changed from >=3.8 to >=3.10 to align with existing code that uses modern type hint syntax (list[...], tuple[...])

Changes:

Replaced all playwright imports with patchright
Removed save_session() and load_session() methods from BrowserManager
Replaced browser.new_context() with launch_persistent_context() that stores full browser profile
Added user_data_dir parameter to BrowserManager (defaults to ~/.linkedin_scraper/browser_data)
Improved exception handling with more specific error types and better logging
Updated all tests to use tmp_path fixtures for isolated test environments
Updated sample scripts to remove manual session loading
Removed setup.py in favor of pyproject.toml

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
requirements.txt	Replaced `playwright>=1.40.0` with `patchright>=1.40.0`
pyproject.toml	Updated dependency to patchright, bumped Python requirement to >=3.10, removed Python 3.8/3.9 classifiers
linkedin_scraper/init.py	Updated module docstring to reference Patchright
linkedin_scraper/core/browser.py	Complete rewrite to use persistent context; removed save_session/load_session/browser property
linkedin_scraper/core/auth.py	Updated all Playwright references to Patchright, fixed indentation in one docstring
linkedin_scraper/core/utils.py	Updated imports to patchright, improved exception handling specificity
linkedin_scraper/scrapers/*.py	Updated imports, improved error handling, removed redundant callback defaults
linkedin_scraper/models/person.py	Changed `open_to_work` from `bool = False` to `Optional[bool] = None`
tests/*.py	Added `tmp_path` fixtures, removed session save/load tests, updated fixtures for persistent context
samples/*.py	Removed manual session loading calls, updated documentation
setup.py	Removed entire file (migrated to pyproject.toml)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T20:30:31Z

linkedin_scraper/core/auth.py

     """
     Login to LinkedIn using li_at cookie.

-     Args:
-         page: Playwright page object
-         cookie_value: Value of li_at cookie
+      Args:
+          page: Patchright page object
+          cookie_value: Value of li_at cookie



The docstring and Args section have inconsistent indentation. The opening triple quotes at line 192 uses a single space instead of proper indentation, and the Args section at line 195 has extra leading spaces. This should be corrected to maintain consistent code formatting.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-12T20:30:31Z

tests/test_browser.py

+        cookies_before = await browser.context.cookies()
+        assert isinstance(cookies_before, list)
+
+    # Second session: same data dir should reuse the persistent profile
+    async with BrowserManager(user_data_dir=data_dir, headless=True) as browser:
+        assert browser.page is not None
+        assert browser.context is not None


The test for persistent context doesn't actually verify that data persists between sessions. Consider enhancing this test to add a cookie or localStorage item in the first session, then verify it exists in the second session. For example, after navigating in the first session, you could add a test cookie via await browser.context.add_cookies([{"name": "test_cookie", "value": "test_value", "domain": ".example.com", "path": "/"}]), then in the second session, check that this cookie still exists.

Suggested change

cookies_before = await browser.context.cookies()

assert isinstance(cookies_before, list)

# Second session: same data dir should reuse the persistent profile

async with BrowserManager(user_data_dir=data_dir, headless=True) as browser:

assert browser.page is not None

assert browser.context is not None

await browser.context.add_cookies(

[

{

"name": "test_cookie",

"value": "test_value",

"domain": ".example.com",

"path": "/",

}

]

)

cookies = await browser.context.cookies()

assert any(

c.get("name") == "test_cookie" and c.get("value") == "test_value"

for c in cookies

)

# Second session: same data dir should reuse the persistent profile

async with BrowserManager(user_data_dir=data_dir, headless=True) as browser:

await browser.page.goto("https://www.example.com")

cookies_after = await browser.context.cookies()

assert any(

c.get("name") == "test_cookie" and c.get("value") == "test_value"

for c in cookies_after

)

Copilot · 2026-02-12T20:30:32Z

linkedin_scraper/core/browser.py


 logger = logging.getLogger(__name__)

+_DEFAULT_USER_DATA_DIR = Path.home() / ".linkedin_scraper" / "browser_data"


When using the default user_data_dir, concurrent executions of the scraper will fail because Chromium locks the profile directory. If multiple browser instances are needed, users must provide different user_data_dir paths. Consider documenting this limitation or adding a check that provides a helpful error message when the directory is locked, rather than letting Chromium's generic error surface.

linkedin_scraper/core/utils.py

linkedin_scraper/scrapers/company_posts.py

Copilot · 2026-02-12T20:30:33Z

linkedin_scraper/scrapers/job.py

                    if len(text) < 50:
                        return text
-        except:
+        except PlaywrightTimeoutError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-12T20:30:34Z

linkedin_scraper/scrapers/job.py

                        if 'applicant' in text_lower or 'people clicked' in text_lower or 'applied' in text_lower:
                            return text
-        except:
+        except PlaywrightTimeoutError:


'except' clause does nothing but pass and there is no explanatory comment.

linkedin_scraper/scrapers/job.py

Copilot · 2026-02-12T20:30:34Z

linkedin_scraper/core/utils.py

-    except Exception:
+    except RateLimitError:
+        raise
+    except PlaywrightTimeoutError:


'except' clause does nothing but pass and there is no explanatory comment.

Copilot · 2026-02-12T20:30:34Z

linkedin_scraper/core/utils.py

            return True
-    except:
+    except PlaywrightTimeoutError:
        pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

pass

logger.debug("Timeout while attempting to close modal; treating as no modal closed")

return False

Playwright exports some LinkedIn cookies with .www.linkedin.com domain but Chromium stores them as .linkedin.com. On Linux, the mismatched domains prevent cookies from being sent with HTTP requests. - Normalize .www.linkedin.com → .linkedin.com at export time - Filter to LinkedIn-only cookies on export (skip third-party) - Only import auth cookies (li_at, li_rm) to avoid redirect loops from stale anti-bot/routing cookies - Clear existing cookies before import to remove undecryptable entries

stickerdaniel added 4 commits February 12, 2026 00:39

polish

6df8f33

chore: Remove uv.lock

2224286

This repo uses requirements.txt, not uv.

Copilot AI review requested due to automatic review settings February 12, 2026 20:23

Copilot started reviewing on behalf of stickerdaniel February 12, 2026 20:23 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

stickerdaniel force-pushed the feat/patchright-persistent-context branch from 20081d1 to 2224286 Compare February 12, 2026 23:44

stickerdaniel added 3 commits February 13, 2026 15:52

feat: add cookie bridge for cross-platform profile portability

1dc67f6

chore: bump version to 3.1.4

ac654e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/patchright & persistent context#280

feat/patchright & persistent context#280
stickerdaniel wants to merge 7 commits intojoeyism:masterfrom
stickerdaniel:feat/patchright-persistent-context

stickerdaniel commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

stickerdaniel Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        cookies_before = await browser.context.cookies()
-        assert isinstance(cookies_before, list)
-    # Second session: same data dir should reuse the persistent profile
-    async with BrowserManager(user_data_dir=data_dir, headless=True) as browser:
-        assert browser.page is not None
-        assert browser.context is not None
+        await browser.context.add_cookies(
+            [
+                {
+                    "name": "test_cookie",
+                    "value": "test_value",
+                    "domain": ".example.com",
+                    "path": "/",
+                }
+            ]
+        )
+        cookies = await browser.context.cookies()
+        assert any(
+            c.get("name") == "test_cookie" and c.get("value") == "test_value"
+            for c in cookies
+        )
+    # Second session: same data dir should reuse the persistent profile
+    async with BrowserManager(user_data_dir=data_dir, headless=True) as browser:
+        await browser.page.goto("https://www.example.com")
+        cookies_after = await browser.context.cookies()
+        assert any(
+            c.get("name") == "test_cookie" and c.get("value") == "test_value"
+            for c in cookies_after
+        )


		logger = logging.getLogger(__name__)

		_DEFAULT_USER_DATA_DIR = Path.home() / ".linkedin_scraper" / "browser_data"

	pass
	logger.debug("Timeout while attempting to close modal; treating as no modal closed")
	return False

Conversation

stickerdaniel commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

stickerdaniel Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants