Skip to content

refactor(website): Normalize repo URL and error labels in analytics#1586

Merged
yamadashy merged 2 commits into
mainfrom
refactor/analytics-normalize-labels
May 21, 2026
Merged

refactor(website): Normalize repo URL and error labels in analytics#1586
yamadashy merged 2 commits into
mainfrom
refactor/analytics-normalize-labels

Conversation

@yamadashy

Copy link
Copy Markdown
Owner

Summary

  • Add normalizeRepoLabel which collapses input into github:owner/repo, gist:id/..., or external:<host> and drops query/hash components.
  • Add classifyError which maps free-form error strings into a bounded set of stable category codes (timeout, rate_limit, not_found, verification, invalid_input, network, server, unknown).
  • Use both helpers from analyticsUtils.trackPackStart / trackPackSuccess / trackPackError so GA never receives the raw repository URL or the verbatim error message.

Why

The previous implementation forwarded the raw input URL (which may carry tokens, query strings, or private host names) and the unredacted server error string as Google Analytics event labels. This shrinks what reaches GA to a privacy-safe, label-bounded shape.

Checklist

  • Run node --run lint (website/client)

Previously the raw repository URL (potentially containing tokens, query
strings, or private host names) and the full error message were sent
verbatim as GA event labels. This narrows what reaches analytics to a
stable, privacy-safe shape:

- `normalizeRepoLabel` maps URLs to `github:owner/repo`, `gist:id/...`,
  or `external:<host>` and drops query/hash components.
- `classifyError` collapses free-form error text into a bounded set of
  category codes (timeout, rate_limit, not_found, verification, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b581446e-79c8-4b57-ba8f-5f5276bea89f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds privacy-safe analytics labeling to prevent forwarding raw repository URLs and error strings in event tracking. It introduces utility functions to normalize repo identifiers and classify errors into bounded categories, then integrates them into pack-success and pack-error event tracking.

Changes

Privacy-safe analytics labeling

Layer / File(s) Summary
Privacy-safe labeling contracts and implementation
website/client/components/utils/analytics.ts
normalizeRepoLabel() sanitizes GitHub shorthand (owner/repo) and URL forms to safe labels; ErrorCategory union defines stable error codes (timeout, rate_limit, not_found, verification, invalid_input, network, server, unknown); classifyError() maps free-form error text to category codes.
Event tracking integration
website/client/components/utils/analytics.ts
Pack-success events use normalized repo label for "files" and "chars" event labels; pack-error events combine normalized repo label with classified error category instead of raw repoUrl and error strings.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: normalizing repo URLs and error labels in analytics for privacy protection.
Description check ✅ Passed The description covers the summary, rationale, and checklist sections; however, it is missing the npm test requirement and only partially fulfills the template's checklist items.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/analytics-normalize-labels

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

⚡ Performance Benchmark

Latest commit:cbc3fe8 fix(website): Address PR review feedback for analytics normalization
Status:✅ Benchmark complete!
Ubuntu:0.53s (±0.02s) → 0.53s (±0.07s) · -0.01s (-0.9%)
macOS:0.54s (±0.10s) → 0.54s (±0.09s) · +0.00s (+0.6%)
Windows:0.87s (±0.02s) → 0.87s (±0.02s) · +0.00s (+0.2%)
Details
  • Packing the repomix repository with node bin/repomix.cjs
  • Warmup: 2 runs (discarded), interleaved execution
  • Measurement: 20 runs / 30 on macOS (median ± IQR)
  • Workflow run
History

2d74cc6 refactor(website): Normalize repo URL and error labels in analytics

Ubuntu:0.62s (±0.01s) → 0.62s (±0.01s) · +0.00s (+0.0%)
macOS:0.45s (±0.10s) → 0.43s (±0.10s) · -0.01s (-2.5%)
Windows:0.86s (±0.05s) → 0.87s (±0.02s) · +0.01s (+0.8%)

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented May 21, 2026

Copy link
Copy Markdown

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: cbc3fe8
Status:⚡️  Build in progress...

View logs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@website/client/components/utils/analytics.ts`:
- Around line 70-72: The gist host check currently requires two path segments so
single-segment Gist URLs like https://gist.github.com/<gistId> fall through;
update the logic in website/client/components/utils/analytics.ts where host ===
'gist.github.com' and segments is used so that if segments.length >= 1 you
return a gist label, using `gist:${segments[0]}` for one-segment URLs and
`gist:${segments[0]}/${segments[1]}` when two segments exist (ensure you still
skip empty segments/trailing slashes). Locate the conditional that checks host
=== 'gist.github.com' and the `segments` array and adjust the condition and
returned string accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9af2cbb5-1f5b-4ab4-b758-9f8de7f7e015

📥 Commits

Reviewing files that changed from the base of the PR and between eb945d9 and 2d74cc6.

📒 Files selected for processing (1)
  • website/client/components/utils/analytics.ts

Comment thread website/client/components/utils/analytics.ts Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces normalizeRepoLabel and classifyError utility functions to ensure that repository identifiers and error messages are sanitized and categorized before being sent to Google Analytics, preventing the leakage of sensitive information. The analyticsUtils methods have been updated to utilize these functions. Feedback suggests improving the normalization logic by stripping trailing slashes from inputs to correctly handle GitHub shorthand and removing the www. prefix from hostnames to ensure consistent service classification.

Comment thread website/client/components/utils/analytics.ts Outdated
Comment thread website/client/components/utils/analytics.ts Outdated
Tighten `normalizeRepoLabel` so a handful of input shapes that
previously fell through to `invalid` or `external:*` now map to the
expected `github:*` / `gist:*` labels:

- Strip trailing slashes from the input so `owner/repo/` still hits the
  shorthand regex instead of being parsed as a URL.
- Drop a leading `www.` from the hostname so `www.github.com/...`
  classifies the same as the canonical host.
- Allow single-segment Gist URLs (`gist.github.com/<id>`) since
  GitHub serves them as a redirect to the user-prefixed form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yamadashy yamadashy merged commit d480700 into main May 21, 2026
28 of 29 checks passed
@yamadashy yamadashy deleted the refactor/analytics-normalize-labels branch May 21, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant