refactor(website): Normalize repo URL and error labels in analytics#1586
Conversation
Previously the raw repository URL (potentially containing tokens, query strings, or private host names) and the full error message were sent verbatim as GA event labels. This narrows what reaches analytics to a stable, privacy-safe shape: - `normalizeRepoLabel` maps URLs to `github:owner/repo`, `gist:id/...`, or `external:<host>` and drops query/hash components. - `classifyError` collapses free-form error text into a bounded set of category codes (timeout, rate_limit, not_found, verification, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds privacy-safe analytics labeling to prevent forwarding raw repository URLs and error strings in event tracking. It introduces utility functions to normalize repo identifiers and classify errors into bounded categories, then integrates them into pack-success and pack-error event tracking. ChangesPrivacy-safe analytics labeling
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
⚡ Performance Benchmark
Details
History2d74cc6 refactor(website): Normalize repo URL and error labels in analytics
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@website/client/components/utils/analytics.ts`:
- Around line 70-72: The gist host check currently requires two path segments so
single-segment Gist URLs like https://gist.github.com/<gistId> fall through;
update the logic in website/client/components/utils/analytics.ts where host ===
'gist.github.com' and segments is used so that if segments.length >= 1 you
return a gist label, using `gist:${segments[0]}` for one-segment URLs and
`gist:${segments[0]}/${segments[1]}` when two segments exist (ensure you still
skip empty segments/trailing slashes). Locate the conditional that checks host
=== 'gist.github.com' and the `segments` array and adjust the condition and
returned string accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9af2cbb5-1f5b-4ab4-b758-9f8de7f7e015
📒 Files selected for processing (1)
website/client/components/utils/analytics.ts
There was a problem hiding this comment.
Code Review
This pull request introduces normalizeRepoLabel and classifyError utility functions to ensure that repository identifiers and error messages are sanitized and categorized before being sent to Google Analytics, preventing the leakage of sensitive information. The analyticsUtils methods have been updated to utilize these functions. Feedback suggests improving the normalization logic by stripping trailing slashes from inputs to correctly handle GitHub shorthand and removing the www. prefix from hostnames to ensure consistent service classification.
Tighten `normalizeRepoLabel` so a handful of input shapes that previously fell through to `invalid` or `external:*` now map to the expected `github:*` / `gist:*` labels: - Strip trailing slashes from the input so `owner/repo/` still hits the shorthand regex instead of being parsed as a URL. - Drop a leading `www.` from the hostname so `www.github.com/...` classifies the same as the canonical host. - Allow single-segment Gist URLs (`gist.github.com/<id>`) since GitHub serves them as a redirect to the user-prefixed form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
normalizeRepoLabelwhich collapses input intogithub:owner/repo,gist:id/..., orexternal:<host>and drops query/hash components.classifyErrorwhich maps free-form error strings into a bounded set of stable category codes (timeout,rate_limit,not_found,verification,invalid_input,network,server,unknown).analyticsUtils.trackPackStart/trackPackSuccess/trackPackErrorso GA never receives the raw repository URL or the verbatim error message.Why
The previous implementation forwarded the raw input URL (which may carry tokens, query strings, or private host names) and the unredacted server error string as Google Analytics event labels. This shrinks what reaches GA to a privacy-safe, label-bounded shape.
Checklist
node --run lint(website/client)