Updates to remediate 403 errors#10
Conversation
…ation extraction Generalizes the cancelled-date skip to detect all non-posted payment statuses (pending, scheduled, processing, cancelled) so the scraper no longer tries to click into non-clickable rows. Single-row parse failures are now non-fatal (warn + continue) instead of aborting the entire run. If ALL rows fail, a RuntimeError is still raised as a safety net. Closes #10 Made-with: Cursor
|
Thanks for taking a look at this! I tested the branch locally with a regular Python install (no Docker), and I’m still getting 403s when running headless. --headful mode works fine and is definitely more stable than before, so that’s a solid improvement. I’m guessing this is more likely related to headless detection or browser fingerprinting |
|
Attempted again on this branch and 403 still persists. Tried with If I have time I will see if I can do more troubleshooting or find a workaround. Thank you for working on this, still works great with headful and syncs everything perfectly. |
|
This is actually really helpful. I can test python on my end and see if I can replicate sometime this week. My leading assumption is something in the python runtime is being picked up differently from the docker runtime which is either triggering anti-bot detection or something obvious |
Improve portal sync reliability by detecting and skipping non-posted payment rows (pending/scheduled/processing/cancelled), adding safer extraction fallbacks, and expanding parser tests so dry-run output reflects true posted payments. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
…untime - Override User-Agent in browser context (eliminates HeadlessChrome marker) - Expand init script: spoof navigator.plugins/mimeTypes, mask WebGL SwiftShader renderer, add chrome.app/chrome.csi, clean CDP leak globals - Use --headless=new via Chromium args for less detectable headless mode - Consolidate duplicated browser/context/hooks setup into shared helpers - Add PortalAccessDeniedError + 403 detection/retry in _login flow - Wire _launch_browser, _create_browser_context, _install_context_hooks into extract(), discover_loan_groups(), and browse_and_capture() - Add unit tests for _looks_like_access_denied heuristic Addresses persistent HTTP 403 Access Denied for headless runs on native Python/Windows reported in #9 / PR #10. Made-with: Cursor
81d540a to
7bf704f
Compare
|
@thundervoid @kevhardy Would appreciate if you could pull this branch and re-test headless on your setups. Made some more hardening updates |
|
Just ran it in headless and seems to have worked without fail I will continue to test throughout the next week and report any issues thank you so much for this! |
|
Ok awesome. I'll keep this open for about a week so you can both give it some time. And i can make tweaks as needed. Just needed to add some more hardening for anti-bot measures so it's not crazy obvious on python runtimes. |
|
Works great for me now. Runs headless on my windows and linux machine now. Thanks for figuring that out! |
By default we now also use playwright tweaks to assist in remediating troublesome 403 errors during headless usage.