feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist for tracking#252
Draft
rbonestell wants to merge 10 commits into
Draft
feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist for tracking#252rbonestell wants to merge 10 commits into
rbonestell wants to merge 10 commits into
Conversation
… /tracker.js CORS
…ker.js as ACAO * CORS cannot gate who loads a <script> or POSTs to the worker, so the per-origin ACAO echo on tracker.js enforced nothing. Move real allowlist enforcement to /collect, where data is actually recorded. - tracker.js now returns Access-Control-Allow-Origin: * unconditionally (satisfies SRI/crossorigin, drops Vary:Origin cache fragmentation). - /collect drops (200 gif, no datapoint write) hits whose host (h) or Origin/Referer headers aren't an allowed origin; opt-in via the env var. The analytics referrer param (r) is intentionally not validated. - New app/lib/allowedOrigins.ts: parseAllowedOrigins / extractHost / isHostAllowed (exact + subdomain matching). Hardening from review: - "*" entry => allow-all (empty list) instead of silently blocking all. - extractHost rejects userinfo (https://evil.com@pmux.io/), the opaque "null" origin, and hostless schemes; robust bare-host/port/IPv6 parse. Adds 12 tests (lib + collect) incl. drop-path Last-Modified guard, bare-host e2e, Referer-only, empty-list, and "*" allow-all.
6d8b2e6 to
044290f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Adds an opt-in origin allowlist for tracking (
TRACKER_ALLOWED_ORIGINS) so a Counterscale instance only records hits from sites you actually own, and inlines the tracker script into the worker bundle.Note
This branch is stacked on #251, so the diff against
mainalso shows that PR's changes (e.g.CF_DATASET_NAMEand the CLI config). The PR-specific work is the origin allowlist and the tracker-script inlining, both in@counterscale/server.Changes by Package
@counterscale/cli
No changes in this PR. (CLI/dataset changes visible in the diff belong to #251.)
@counterscale/server
Origin allowlist for tracking (
TRACKER_ALLOWED_ORIGINS) — set a comma-separated list of domains and the instance only records pageviews originating from those domains (and their subdomains):example.commatchesexample.comand all of its subdomains (blog.example.com,app.example.com, …) — you don't list subdomains separately — but not sibling domains likenotexample.com. Entries may be bare hosts or scheme-prefixed (https://example.com); both normalize to the host./collect(where data is recorded). When the var is set, a hit whose reported host andOrigin/Refererdon't resolve to an allowed domain is dropped — the endpoint still returns the normal 1×1 gif (200) but writes no datapoint. The analytics referrer (the visitor's traffic source) is deliberately not validated.*) means no enforcement; all origins are recorded, preserving today's behavior.app/lib/allowedOrigins.ts(parse / host-extract / match helpers) plusTRACKER_ALLOWED_ORIGINSwiring in the/collecthandler and type definitions.tracker.jsis now served withAccess-Control-Allow-Origin: *. CORS can't gate who loads a<script>, so per-script origin restriction would enforce nothing; a wildcard also keeps Subresource Integrity (crossorigin) working and avoids fragmenting the CDN cache. Real enforcement lives at/collect.Tracker script inlining — the tracker source is relocated under
app/tracker/and inlined into the worker via Vite's?rawimport (adding a Vite client type reference and removing the separaterun_worker_firstasset path), so/tracker.jsis served straight from the bundle.Tests — new unit + integration coverage for origin parsing/matching (exact, subdomain, sibling-rejection, userinfo,
nullorigin, ports, IPv6,*allow-all),/collectenforcement (allow/drop, drop-path leavesLast-Modifiedunset, bare-host,Referer-only, opt-in), and the inlined tracker loader. Full server suite green.@counterscale/tracker
No changes in this PR.
Other Changes
.gitignore: ignore the tracker output copied underapp/tracker/by thecopytrackerscript.Additional Notes
Origin,Referer, and the reported host are client-supplied and can be forged by a non-browser client, so it stops accidental/casual cross-site recording and honest browsers, not a determined attacker. Data is partitioned by site ID, which limits the blast radius. Hardening includes rejecting userinfo-bearing values (e.g.https://evil.com@example.com/), the opaquenullorigin, and treating a lone*as allow-all rather than silently blocking everything.TRACKER_ALLOWED_ORIGINSis unset by default; existing deployments are unaffected until it's set.