Skip to content

Updates to remediate 403 errors#10

Merged
mattebad merged 6 commits into
mainfrom
playwright-tweaks
Mar 23, 2026
Merged

Updates to remediate 403 errors#10
mattebad merged 6 commits into
mainfrom
playwright-tweaks

Conversation

@mattebad

@mattebad mattebad commented Feb 26, 2026

Copy link
Copy Markdown
Owner

By default we now also use playwright tweaks to assist in remediating troublesome 403 errors during headless usage.

@mattebad mattebad mentioned this pull request Feb 26, 2026
mattebad pushed a commit that referenced this pull request Feb 26, 2026
…ation extraction

Generalizes the cancelled-date skip to detect all non-posted payment
statuses (pending, scheduled, processing, cancelled) so the scraper
no longer tries to click into non-clickable rows.

Single-row parse failures are now non-fatal (warn + continue) instead
of aborting the entire run. If ALL rows fail, a RuntimeError is still
raised as a safety net.

Closes #10

Made-with: Cursor
@mattebad mattebad changed the title Updates to remediate 403 errors along with pending payment error Updates to remediate 403 errors Feb 26, 2026
@mattebad mattebad closed this in 1905521 Feb 26, 2026
@mattebad mattebad reopened this Feb 26, 2026
@thundervoid

Copy link
Copy Markdown

Thanks for taking a look at this! I tested the branch locally with a regular Python install (no Docker), and I’m still getting 403s when running headless. --headful mode works fine and is definitely more stable than before, so that’s a solid improvement.

I’m guessing this is more likely related to headless detection or browser fingerprinting

@kevhardy

kevhardy commented Mar 4, 2026

Copy link
Copy Markdown

Attempted again on this branch and 403 still persists. Tried with --fresh-session as well. Requests are being made from a residential IP.

If I have time I will see if I can do more troubleshooting or find a workaround. Thank you for working on this, still works great with headful and syncs everything perfectly.

@mattebad

Copy link
Copy Markdown
Owner Author

This is actually really helpful. I can test python on my end and see if I can replicate sometime this week. My leading assumption is something in the python runtime is being picked up differently from the docker runtime which is either triggering anti-bot detection or something obvious

UnexpectedFisting and others added 6 commits March 15, 2026 12:53
Improve portal sync reliability by detecting and skipping non-posted payment rows (pending/scheduled/processing/cancelled), adding safer extraction fallbacks, and expanding parser tests so dry-run output reflects true posted payments.

Made-with: Cursor
…untime

- Override User-Agent in browser context (eliminates HeadlessChrome marker)
- Expand init script: spoof navigator.plugins/mimeTypes, mask WebGL
  SwiftShader renderer, add chrome.app/chrome.csi, clean CDP leak globals
- Use --headless=new via Chromium args for less detectable headless mode
- Consolidate duplicated browser/context/hooks setup into shared helpers
- Add PortalAccessDeniedError + 403 detection/retry in _login flow
- Wire _launch_browser, _create_browser_context, _install_context_hooks
  into extract(), discover_loan_groups(), and browse_and_capture()
- Add unit tests for _looks_like_access_denied heuristic

Addresses persistent HTTP 403 Access Denied for headless runs on native
Python/Windows reported in #9 / PR #10.

Made-with: Cursor
@mattebad mattebad force-pushed the playwright-tweaks branch from 81d540a to 7bf704f Compare March 15, 2026 18:07
@mattebad

Copy link
Copy Markdown
Owner Author

@thundervoid @kevhardy Would appreciate if you could pull this branch and re-test headless on your setups. Made some more hardening updates

@thundervoid

Copy link
Copy Markdown

Just ran it in headless and seems to have worked without fail I will continue to test throughout the next week and report any issues thank you so much for this!

@mattebad

Copy link
Copy Markdown
Owner Author

Ok awesome. I'll keep this open for about a week so you can both give it some time. And i can make tweaks as needed.

Just needed to add some more hardening for anti-bot measures so it's not crazy obvious on python runtimes.

@kevhardy

Copy link
Copy Markdown

Works great for me now. Runs headless on my windows and linux machine now. Thanks for figuring that out!

@mattebad mattebad merged commit 43b278a into main Mar 23, 2026
2 checks passed
@mattebad mattebad deleted the playwright-tweaks branch March 23, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants