Skip to content

Commit d4535eb

Browse files
author
UnexpectedFisting
committed
fix(docker): harden runtime against portal 403s
Install a less-detectable browser path in Docker (Chrome on x86_64; Chromium fallback on ARM64), align browser-compat fingerprint to Linux, and add update+rebuild wrapper modes for scheduled runs. Made-with: Cursor
1 parent 43b278a commit d4535eb

6 files changed

Lines changed: 144 additions & 13 deletions

File tree

Dockerfile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,21 @@ WORKDIR /app
44

55
COPY requirements.txt /app/requirements.txt
66
RUN pip install --no-cache-dir -r /app/requirements.txt
7+
# Playwright "chrome" channel isn't supported on Linux ARM64.
8+
# - x86_64: install real Chrome channel (best for avoiding 403 fingerprinting)
9+
# - arm64: fall back to Playwright Chromium
10+
RUN set -e; arch="$(uname -m)"; \
11+
if [ "$arch" = "x86_64" ]; then \
12+
python -m playwright install chrome; \
13+
else \
14+
python -m playwright install chromium; \
15+
fi
716

817
COPY src /app/src
918

1019
ENV PYTHONUNBUFFERED=1
1120
ENV PYTHONPATH=/app/src
21+
ENV STUDENTAID_MONARCH_RUNTIME=docker
1222

1323
ENTRYPOINT ["python", "-m", "studentaid_monarch_sync"]
1424

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ Pick **one** of the following and run it end-to-end. For most people (especially
8585

8686
#### Runtime A: Docker (recommended) 🐳
8787
This repo includes a `docker-compose.yml` service that runs the sync as a **run-once** container.
88+
The Docker image now installs a real **Google Chrome** channel and uses a Linux-specific browser-compat profile, which reduces `HeadlessChrome`-style fingerprinting that can trigger portal `403 Access Denied` responses.
8889

8990
##### Docker Desktop (Windows/macOS)
9091
1. Install Docker Desktop and make sure `docker compose` works in a terminal.
@@ -122,7 +123,7 @@ To keep the scheduled command simple (and easy to update later), you can schedul
122123
- **Add arguments** (example):
123124

124125
```text
125-
-NoProfile -File .\scripts\docker_sync.ps1 run --payments-since 2025-01-01
126+
-NoProfile -File .\scripts\docker_sync.ps1 update-run --payments-since 2025-01-01
126127
```
127128

128129
- **Start in**: `C:\path\to\repo` (the folder that contains `docker-compose.yml`)
@@ -131,14 +132,14 @@ To keep the scheduled command simple (and easy to update later), you can schedul
131132
- Create a LaunchAgent that runs:
132133

133134
```bash
134-
cd /path/to/repo && bash ./scripts/docker_sync.sh run --payments-since 2025-01-01
135+
cd /path/to/repo && bash ./scripts/docker_sync.sh update-run --payments-since 2025-01-01
135136
```
136137

137138
- **Linux (cron/systemd)**:
138139
- Use cron or a systemd timer to run the same command on your desired timeframe:
139140

140141
```bash
141-
cd /path/to/repo && bash ./scripts/docker_sync.sh run --payments-since 2025-01-01
142+
cd /path/to/repo && bash ./scripts/docker_sync.sh update-run --payments-since 2025-01-01
142143
```
143144

144145
##### Unraid (NAS)
@@ -160,7 +161,7 @@ bash ./scripts/docker_sync.sh dry-run --payments-since 2025-01-01
160161
4. Schedule it (e.g., Unraid **User Scripts** plugin) with a daily command like:
161162

162163
```bash
163-
cd /path/to/repo && bash ./scripts/docker_sync.sh run --payments-since 2025-01-01
164+
cd /path/to/repo && bash ./scripts/docker_sync.sh update-run --payments-since 2025-01-01
164165
```
165166

166167
Keep `./data` persistent so sessions and the SQLite idempotency DB survive restarts.
@@ -340,6 +341,7 @@ See **Quick start → Runtime A: Docker (recommended)** for the Unraid schedulin
340341
<a id="403-access-denied"></a>
341342
- **HTTP 403 Access Denied / portal blocks the headless browser**
342343
- Some servicers (notably Nelnet) occasionally return a bare `HTTP 403 Access Denied` page to headless browsers that look like automation. The tool detects this and retries once with a fresh session automatically.
344+
- Docker builds now install a real Chrome channel and use a Linux-aligned browser fingerprint. If you updated from an older image, rebuild it first with `docker compose build --no-cache`.
343345
- If it keeps failing, try the following in order:
344346
1. Run `sync --headful --manual-mfa` once to establish a fresh, trusted browser session stored under `data/servicer_storage_state_*.json`. Subsequent headless runs reuse that session.
345347
2. Add `--fresh-session` to force discarding any stale stored session before retrying.

scripts/docker_sync.ps1

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Param(
22
[Parameter(Position = 0)]
3-
[ValidateSet("setup-accounts", "preflight", "dry-run", "run")]
3+
[ValidateSet("setup-accounts", "preflight", "dry-run", "run", "update", "update-run", "update-dry-run")]
44
[string]$Mode = "run",
55

66
[Parameter(ValueFromRemainingArguments = $true)]
@@ -24,6 +24,24 @@ $Service = "studentaid-monarch-sync"
2424

2525
New-Item -ItemType Directory -Force -Path (Join-Path $RepoDir "data") | Out-Null
2626

27+
function Invoke-GitPull {
28+
if (Get-Command git -ErrorAction SilentlyContinue) {
29+
if (Test-Path (Join-Path $RepoDir ".git")) {
30+
# Best-effort update; keep it safe/non-destructive.
31+
git pull --ff-only
32+
}
33+
}
34+
}
35+
36+
function Invoke-ComposeBuild {
37+
$buildArgs = @("build", "--pull")
38+
if ($env:NO_CACHE -eq "1") {
39+
$buildArgs += "--no-cache"
40+
}
41+
$buildArgs += $Service
42+
docker compose @buildArgs
43+
}
44+
2745
switch ($Mode) {
2846
"setup-accounts" {
2947
docker compose run --rm --build $Service setup-monarch-accounts --apply @Args
@@ -37,6 +55,20 @@ switch ($Mode) {
3755
"run" {
3856
docker compose run --rm $Service sync @Args
3957
}
58+
"update" {
59+
Invoke-GitPull
60+
Invoke-ComposeBuild
61+
}
62+
"update-run" {
63+
Invoke-GitPull
64+
Invoke-ComposeBuild
65+
docker compose run --rm $Service sync @Args
66+
}
67+
"update-dry-run" {
68+
Invoke-GitPull
69+
Invoke-ComposeBuild
70+
docker compose run --rm $Service sync --dry-run @Args
71+
}
4072
}
4173

4274

scripts/docker_sync.sh

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ set -euo pipefail
88
# ./scripts/docker_sync.sh preflight
99
# ./scripts/docker_sync.sh dry-run --payments-since 2025-01-01
1010
# ./scripts/docker_sync.sh run --payments-since 2025-01-01
11+
# ./scripts/docker_sync.sh update
12+
# ./scripts/docker_sync.sh update-run --payments-since 2025-01-01
13+
# ./scripts/docker_sync.sh update-dry-run --payments-since 2025-01-01
1114

1215
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
1316
cd "${REPO_DIR}"
@@ -19,6 +22,24 @@ SERVICE="studentaid-monarch-sync"
1922

2023
mkdir -p data
2124

25+
_git_pull_ff_only() {
26+
if command -v git >/dev/null 2>&1 && [ -d ".git" ]; then
27+
# Best-effort update; keep it safe/non-destructive.
28+
git pull --ff-only
29+
fi
30+
}
31+
32+
_compose_build() {
33+
# Pull newer base image layers when available.
34+
# Set NO_CACHE=1 to force a full rebuild.
35+
local args=(build --pull)
36+
if [ "${NO_CACHE:-}" = "1" ]; then
37+
args+=(--no-cache)
38+
fi
39+
args+=("${SERVICE}")
40+
docker compose "${args[@]}"
41+
}
42+
2243
case "${MODE}" in
2344
setup-accounts)
2445
docker compose run --rm --build "${SERVICE}" setup-monarch-accounts --apply "$@"
@@ -32,9 +53,23 @@ case "${MODE}" in
3253
run)
3354
docker compose run --rm "${SERVICE}" sync "$@"
3455
;;
56+
update)
57+
_git_pull_ff_only
58+
_compose_build
59+
;;
60+
update-run)
61+
_git_pull_ff_only
62+
_compose_build
63+
docker compose run --rm "${SERVICE}" sync "$@"
64+
;;
65+
update-dry-run)
66+
_git_pull_ff_only
67+
_compose_build
68+
docker compose run --rm "${SERVICE}" sync --dry-run "$@"
69+
;;
3570
*)
3671
echo "Unknown mode: ${MODE}"
37-
echo "Expected: setup-accounts | preflight | dry-run | run"
72+
echo "Expected: setup-accounts | preflight | dry-run | run | update | update-run | update-dry-run"
3873
exit 2
3974
;;
4075
esac

src/studentaid_monarch_sync/portal/client.py

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import logging
44
import json
5+
import os
56
import random
67
import re
78
import shutil
@@ -79,7 +80,19 @@ def __init__(
7980
"--disable-blink-features=AutomationControlled",
8081
]
8182

82-
_BROWSER_COMPAT_INIT_SCRIPT = r"""
83+
_WINDOWS_CHROME_USER_AGENT = (
84+
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
85+
"AppleWebKit/537.36 (KHTML, like Gecko) "
86+
"Chrome/131.0.0.0 Safari/537.36"
87+
)
88+
89+
_LINUX_CHROME_USER_AGENT = (
90+
"Mozilla/5.0 (X11; Linux x86_64) "
91+
"AppleWebKit/537.36 (KHTML, like Gecko) "
92+
"Chrome/131.0.0.0 Safari/537.36"
93+
)
94+
95+
_BROWSER_COMPAT_INIT_SCRIPT_TEMPLATE = r"""
8396
(() => {
8497
// Mask navigator.webdriver
8598
try { Object.defineProperty(navigator, 'webdriver', { get: () => undefined }); } catch (_) {}
@@ -91,6 +104,13 @@ def __init__(
91104
});
92105
} catch (_) {}
93106
107+
// Keep platform aligned with the chosen runtime/user-agent.
108+
try {
109+
Object.defineProperty(navigator, 'platform', {
110+
get: () => __NAVIGATOR_PLATFORM__,
111+
});
112+
} catch (_) {}
113+
94114
// Chrome runtime presence (headless Chrome often lacks this)
95115
try {
96116
if (!window.chrome) { window.chrome = {}; }
@@ -191,6 +211,28 @@ def __init__(
191211
})();
192212
"""
193213

214+
def _browser_compat_runtime(self) -> str:
215+
runtime = (os.getenv("STUDENTAID_MONARCH_RUNTIME") or "").strip().lower()
216+
if runtime in {"docker", "native"}:
217+
return runtime
218+
return "docker" if Path("/.dockerenv").exists() else "native"
219+
220+
def _browser_compat_user_agent(self) -> str:
221+
if self._browser_compat_runtime() == "docker":
222+
return self._LINUX_CHROME_USER_AGENT
223+
return self._WINDOWS_CHROME_USER_AGENT
224+
225+
def _browser_compat_platform(self) -> str:
226+
if self._browser_compat_runtime() == "docker":
227+
return "Linux x86_64"
228+
return "Win32"
229+
230+
def _browser_compat_init_script(self) -> str:
231+
return self._BROWSER_COMPAT_INIT_SCRIPT_TEMPLATE.replace(
232+
"__NAVIGATOR_PLATFORM__",
233+
json.dumps(self._browser_compat_platform()),
234+
)
235+
194236
def _launch_browser(self, p, *, headless: bool, slow_mo: int):
195237
"""
196238
Launch Chromium with anti-detection defaults.
@@ -242,11 +284,7 @@ def _create_browser_context(self, browser, *, storage_state: Optional[str] = Non
242284
Create a browser context with realistic fingerprint settings.
243285
"""
244286
ctx_kwargs: dict = {
245-
"user_agent": (
246-
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
247-
"AppleWebKit/537.36 (KHTML, like Gecko) "
248-
"Chrome/131.0.0.0 Safari/537.36"
249-
),
287+
"user_agent": self._browser_compat_user_agent(),
250288
"color_scheme": "light",
251289
"viewport": {"width": 1920, "height": 1080},
252290
"locale": "en-US",
@@ -276,7 +314,7 @@ def _rewrite(route, request) -> None:
276314
except Exception:
277315
logger.debug("Failed to install dark-host rewrite route.", exc_info=True)
278316

279-
ctx.add_init_script(self._BROWSER_COMPAT_INIT_SCRIPT)
317+
ctx.add_init_script(self._browser_compat_init_script())
280318
ctx.add_init_script(self._CONSENT_DISMISS_SCRIPT)
281319

282320
def _human_delay(self, page: Page, min_ms: int = 80, max_ms: int = 250) -> None:

tests/test_parsing.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,3 +292,17 @@ def test_looks_like_access_denied_ignores_normal_page_content() -> None:
292292
c = _client()
293293
page = _BodyOnlyPage("Welcome back. Payment Activity is ready.")
294294
assert c._looks_like_access_denied(page) is False
295+
296+
297+
def test_browser_compat_uses_linux_profile_for_docker_runtime(monkeypatch: pytest.MonkeyPatch) -> None:
298+
monkeypatch.setenv("STUDENTAID_MONARCH_RUNTIME", "docker")
299+
c = _client()
300+
assert "Linux x86_64" in c._browser_compat_user_agent()
301+
assert c._browser_compat_platform() == "Linux x86_64"
302+
303+
304+
def test_browser_compat_uses_native_profile_when_runtime_forced_native(monkeypatch: pytest.MonkeyPatch) -> None:
305+
monkeypatch.setenv("STUDENTAID_MONARCH_RUNTIME", "native")
306+
c = _client()
307+
assert "Windows NT 10.0" in c._browser_compat_user_agent()
308+
assert c._browser_compat_platform() == "Win32"

0 commit comments

Comments
 (0)