Skip to content

feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist for tracking#252

Draft
rbonestell wants to merge 10 commits into
benvinegar:mainfrom
rbonestell:feat/tracker-allowed-origins
Draft

feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist for tracking#252
rbonestell wants to merge 10 commits into
benvinegar:mainfrom
rbonestell:feat/tracker-allowed-origins

Conversation

@rbonestell

@rbonestell rbonestell commented May 28, 2026

Copy link
Copy Markdown

Overview

Adds an opt-in origin allowlist for tracking (TRACKER_ALLOWED_ORIGINS) so a Counterscale instance only records hits from sites you actually own, and inlines the tracker script into the worker bundle.

Note

This branch is stacked on #251, so the diff against main also shows that PR's changes (e.g. CF_DATASET_NAME and the CLI config). The PR-specific work is the origin allowlist and the tracker-script inlining, both in @counterscale/server.

Changes by Package

@counterscale/cli

No changes in this PR. (CLI/dataset changes visible in the diff belong to #251.)

@counterscale/server

Origin allowlist for tracking (TRACKER_ALLOWED_ORIGINS) — set a comma-separated list of domains and the instance only records pageviews originating from those domains (and their subdomains):

TRACKER_ALLOWED_ORIGINS=example.com,myblog.io,acme.dev
  • Matching is host-based and covers subdomains automatically: a single entry like example.com matches example.com and all of its subdomains (blog.example.com, app.example.com, …) — you don't list subdomains separately — but not sibling domains like notexample.com. Entries may be bare hosts or scheme-prefixed (https://example.com); both normalize to the host.
  • Enforced at /collect (where data is recorded). When the var is set, a hit whose reported host and Origin/Referer don't resolve to an allowed domain is dropped — the endpoint still returns the normal 1×1 gif (200) but writes no datapoint. The analytics referrer (the visitor's traffic source) is deliberately not validated.
  • Opt-in. Unset (or *) means no enforcement; all origins are recorded, preserving today's behavior.
  • New app/lib/allowedOrigins.ts (parse / host-extract / match helpers) plus TRACKER_ALLOWED_ORIGINS wiring in the /collect handler and type definitions.

tracker.js is now served with Access-Control-Allow-Origin: *. CORS can't gate who loads a <script>, so per-script origin restriction would enforce nothing; a wildcard also keeps Subresource Integrity (crossorigin) working and avoids fragmenting the CDN cache. Real enforcement lives at /collect.

Tracker script inlining — the tracker source is relocated under app/tracker/ and inlined into the worker via Vite's ?raw import (adding a Vite client type reference and removing the separate run_worker_first asset path), so /tracker.js is served straight from the bundle.

Tests — new unit + integration coverage for origin parsing/matching (exact, subdomain, sibling-rejection, userinfo, null origin, ports, IPv6, * allow-all), /collect enforcement (allow/drop, drop-path leaves Last-Modified unset, bare-host, Referer-only, opt-in), and the inlined tracker loader. Full server suite green.

@counterscale/tracker

No changes in this PR.

Other Changes

  • .gitignore: ignore the tracker output copied under app/tracker/ by the copytracker script.

Additional Notes

  • Threat model. Enforcement is best-effort: Origin, Referer, and the reported host are client-supplied and can be forged by a non-browser client, so it stops accidental/casual cross-site recording and honest browsers, not a determined attacker. Data is partitioned by site ID, which limits the blast radius. Hardening includes rejecting userinfo-bearing values (e.g. https://evil.com@example.com/), the opaque null origin, and treating a lone * as allow-all rather than silently blocking everything.
  • Backward compatible. TRACKER_ALLOWED_ORIGINS is unset by default; existing deployments are unaffected until it's set.

…ker.js as ACAO *

CORS cannot gate who loads a <script> or POSTs to the worker, so the
per-origin ACAO echo on tracker.js enforced nothing. Move real allowlist
enforcement to /collect, where data is actually recorded.

- tracker.js now returns Access-Control-Allow-Origin: * unconditionally
  (satisfies SRI/crossorigin, drops Vary:Origin cache fragmentation).
- /collect drops (200 gif, no datapoint write) hits whose host (h) or
  Origin/Referer headers aren't an allowed origin; opt-in via the env var.
  The analytics referrer param (r) is intentionally not validated.
- New app/lib/allowedOrigins.ts: parseAllowedOrigins / extractHost /
  isHostAllowed (exact + subdomain matching).

Hardening from review:
- "*" entry => allow-all (empty list) instead of silently blocking all.
- extractHost rejects userinfo (https://evil.com@pmux.io/), the opaque
  "null" origin, and hostless schemes; robust bare-host/port/IPv6 parse.

Adds 12 tests (lib + collect) incl. drop-path Last-Modified guard,
bare-host e2e, Referer-only, empty-list, and "*" allow-all.
@rbonestell rbonestell changed the title feat(server): Configurable Access-Control-Allow-Origin response header for tracker.js feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist + configurable Analytics Engine dataset May 29, 2026
@rbonestell rbonestell changed the title feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist + configurable Analytics Engine dataset feat(server): TRACKER_ALLOWED_ORIGINS origin allowlist for tracking May 29, 2026
@rbonestell rbonestell force-pushed the feat/tracker-allowed-origins branch from 6d8b2e6 to 044290f Compare May 29, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant