Skip to content

Ftr/users lcp#59

Open
jgoddard19 wants to merge 11 commits intomainfrom
ftr/users_lcp
Open

Ftr/users lcp#59
jgoddard19 wants to merge 11 commits intomainfrom
ftr/users_lcp

Conversation

@jgoddard19
Copy link
Contributor

🚀 Pull Request

🎯 PR Type

  • Feature: Adds new functionality or user-facing change.
  • Bug Fix: Solves a defect or incorrect behavior.
  • Refactor: Improves code structure/readability without changing external behavior.
  • Documentation: Updates to README, Wiki, or internal docs.
  • Chore: Non-code changes (e.g., CI configuration, dependency bumps).

1. Issue Tracking & Reference

  • Ticket Number(s): it's out there somewhere

2. Summary of Changes

LCP A/B Testing Implementation

Added two mutually exclusive LCP slowness scenarios for performance monitoring:

Backend Changes

  • Scenario Service: Added two A/B testing endpoints for LCP slowness
    • /api/ab-testing/lcp-slowness-percentage - Affects configurable % of all users (hash-based cohort assignment)
    • /api/ab-testing/lcp-slowness-cohort - Affects 11 hardcoded test users only
    • Updated scenario UI to display and toggle both scenarios
  • Accounts Service: Enhanced /browser-user endpoint to return lcp_delay_ms based on active scenario
    • Defined 11 test users (Alice through Smough) as slow cohort
    • Deterministic cohort assignment via MD5 hash for percentage-based scenario
    • Graceful fallback (0ms delay) when scenario service unavailable

Frontend Changes

  • Root Layout: Fetch and store lcp_delay_ms in sessionStorage, set New Relic custom attributes (lcp_delay_ms, lcp_treatment)
  • Dashboard: Apply delay by showing loading spinners for 3 seconds before rendering LCP elements (account balance cards)
    • Ensures actual LCP measurement captures the delay (not blocked by full-page spinner)

Testing

  • 10 tests in test_ab_testing_scenarios.py (cohort-based only)
    • Validates all 11 slow users get 3000ms delay
    • Validates random users get 0ms delay
    • Tests enable/disable/reset functionality
    • Tests deterministic cohort assignment

Monitoring

  • New Relic dashboards can now facet LCP metrics by lcp_treatment (slow/normal)
  • Enables A/B testing impact analysis on Core Web Vitals

3. Testing

Tested via the UI

sessionStorage.setItem('browserUserId', 'f5e8d1c6-2a9b-4c3e-8f1a-6e5b0d2c9f1a'); // Bob
sessionStorage.setItem('lcpDelayMs', '3000');
location.reload();

After setting this in the console, I validated the response
Screenshot 2026-02-18 at 3 56 06 PM

Test suite run here with new tests included
https://github.com/newrelic/relibank/actions/runs/22162298427

4. Evidence

Above

Also checked with this query

  SELECT
    average(largestContentfulPaint) as 'Avg LCP (ms)',
    percentile(largestContentfulPaint, 50, 95, 99) as 'LCP Percentiles'
  FROM PageViewTiming
  WHERE appName = 'Relibank - Customer Portal'
    AND pageUrl LIKE '%/dashboard%'
    AND largestContentfulPaint > 0
  SINCE 3 hours ago
Screenshot 2026-02-18 at 3 59 53 PM

5. Review and Merge Checklist

Confirm the following prerequisites have been met by checking the boxes:

  • All blocking review comments from the latest round are resolved.
  • This branch has been rebased against the main branch and all merge conflicts are resolved.
  • The code includes sufficient comments, particularly in complex or non-obvious areas.
  • The code adheres to project coding standards (linting, style guides, best practices).
  • Relevant documentation and runbooks have been updated, if necessary.
  • My changes generate no new warnings or errors.
  • My commits are squashed into a single, descriptive commit.
  • My test results are uploaded as a text file, and new tests were created where required.

Comment on lines +11 to +132
runs-on: ubuntu-latest
environment: events

steps:
- name: Enable LCP slowness for 50% of users
run: |
echo "Enabling LCP slowness A/B test (50% of users, 3000ms delay)..."
curl -X POST "${{ vars.BASE_URL }}/scenario-runner/api/ab-testing/lcp-slowness?enabled=true&percentage=50.0&delay_ms=3000" \
--fail-with-body || {
echo "Failed to enable LCP slowness scenario"
exit 1
}

- name: Verify scenario is enabled
run: |
echo "Verifying LCP slowness scenario configuration..."
response=$(curl -s "${{ vars.BASE_URL }}/scenario-runner/api/ab-testing/config")
echo "Response: $response"

enabled=$(echo $response | jq -r '.config.lcp_slowness_enabled')
percentage=$(echo $response | jq -r '.config.lcp_slowness_percentage')
delay=$(echo $response | jq -r '.config.lcp_slowness_delay_ms')

if [ "$enabled" != "true" ]; then
echo "ERROR: LCP slowness not enabled"
exit 1
fi

echo "✓ LCP slowness enabled: $percentage% of users with ${delay}ms delay"

- name: Create deployment marker (A/B test enabled)
env:
NEW_RELIC_API_KEY: ${{ secrets.NEW_RELIC_API_KEY }}
NEW_RELIC_APP_ID: ${{ secrets.NEW_RELIC_BROWSER_APP_ID }}
run: |
echo "Creating deployment marker in New Relic..."
curl -X POST "https://api.newrelic.com/v2/applications/${NEW_RELIC_APP_ID}/deployments.json" \
-H "Api-Key: ${NEW_RELIC_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"deployment": {
"revision": "LCP-AB-Test-Enabled-50%",
"description": "Feature flag: LCP slowness enabled for 50% of users (3000ms delay)",
"user": "GitHub Actions",
"changelog": "A/B test started to measure LCP impact"
}
}' || echo "Warning: Could not create deployment marker"

- name: Wait for data collection (20 minutes)
run: |
echo "Collecting data for 20 minutes..."
echo "Monitor LCP metrics at: https://one.newrelic.com/"
echo ""
echo "NRQL Query to see A/B test impact:"
echo "SELECT percentile(largestContentfulPaint, 50, 75, 95) as 'LCP (ms)', count(*) as 'Sessions'"
echo "FROM PageViewTiming"
echo "WHERE appName = 'ReliBank Frontend' AND pageUrl LIKE '%/dashboard%'"
echo "FACET custom.lcp_treatment TIMESERIES"
echo ""
sleep 1200 # 20 minutes

- name: Disable LCP slowness
run: |
echo "Disabling LCP slowness A/B test..."
curl -X POST "${{ vars.BASE_URL }}/scenario-runner/api/ab-testing/lcp-slowness?enabled=false" \
--fail-with-body || {
echo "Warning: Failed to disable LCP slowness scenario"
}

- name: Verify scenario is disabled
run: |
echo "Verifying LCP slowness scenario is disabled..."
response=$(curl -s "${{ vars.BASE_URL }}/scenario-runner/api/ab-testing/config")
enabled=$(echo $response | jq -r '.config.lcp_slowness_enabled')

if [ "$enabled" == "true" ]; then
echo "WARNING: LCP slowness still enabled!"
exit 1
fi

echo "✓ LCP slowness disabled successfully"

- name: Create rollback marker (A/B test disabled)
env:
NEW_RELIC_API_KEY: ${{ secrets.NEW_RELIC_API_KEY }}
NEW_RELIC_APP_ID: ${{ secrets.NEW_RELIC_BROWSER_APP_ID }}
run: |
echo "Creating rollback marker in New Relic..."
curl -X POST "https://api.newrelic.com/v2/applications/${NEW_RELIC_APP_ID}/deployments.json" \
-H "Api-Key: ${NEW_RELIC_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"deployment": {
"revision": "LCP-AB-Test-Disabled",
"description": "Rollback: LCP slowness disabled, all users back to normal",
"user": "GitHub Actions",
"changelog": "A/B test completed"
}
}' || echo "Warning: Could not create rollback marker"

- name: Test Summary
if: always()
run: |
echo "## LCP A/B Test Summary" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "- **Duration**: 20 minutes" >> $GITHUB_STEP_SUMMARY
echo "- **Cohort Split**: 50% slow / 50% normal" >> $GITHUB_STEP_SUMMARY
echo "- **LCP Delay**: 3000ms for slow cohort" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "### Analysis" >> $GITHUB_STEP_SUMMARY
echo "Review LCP metrics in New Relic to compare:" >> $GITHUB_STEP_SUMMARY
echo "- \`lcp_treatment = normal\` (control group)" >> $GITHUB_STEP_SUMMARY
echo "- \`lcp_treatment = slow\` (test group with 3s delay)" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "### NRQL Query" >> $GITHUB_STEP_SUMMARY
echo '```sql' >> $GITHUB_STEP_SUMMARY
echo 'SELECT percentile(largestContentfulPaint, 50, 75, 95) as "LCP (ms)", count(*) as "Sessions"' >> $GITHUB_STEP_SUMMARY
echo 'FROM PageViewTiming' >> $GITHUB_STEP_SUMMARY
echo "WHERE appName = 'ReliBank Frontend' AND pageUrl LIKE '%/dashboard%'" >> $GITHUB_STEP_SUMMARY
echo 'FACET custom.lcp_treatment' >> $GITHUB_STEP_SUMMARY
echo 'SINCE 30 minutes ago' >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI about 15 hours ago

In general, to fix this class of issue, you add a permissions block either at the root of the workflow (to apply to all jobs) or inside specific jobs, and set the minimal scopes necessary. For a workflow that only calls external APIs and writes to $GITHUB_STEP_SUMMARY, you typically only need contents: read (the minimal default for most workflows) and no write permissions.

For this specific workflow in .github/workflows/flow-lcp-ab-test.yml, the simplest and least invasive fix is to add a top-level permissions block right under the name: (and before on:). The job does not perform any GitHub write actions (no actions/checkout, no pushes, no issue/PR operations), so we can safely lock the token down to contents: read. No additional imports or steps are required; this change only affects token scopes. Concretely, in the region around lines 1–8, add:

permissions:
  contents: read

and keep the rest of the workflow unchanged.

Suggested changeset 1
.github/workflows/flow-lcp-ab-test.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/flow-lcp-ab-test.yml b/.github/workflows/flow-lcp-ab-test.yml
--- a/.github/workflows/flow-lcp-ab-test.yml
+++ b/.github/workflows/flow-lcp-ab-test.yml
@@ -1,5 +1,8 @@
 name: Flow - LCP A/B Test Scenario
 
+permissions:
+  contents: read
+
 on:
   schedule:
     # Run every 6 hours
EOF
@@ -1,5 +1,8 @@
name: Flow - LCP A/B Test Scenario

permissions:
contents: read

on:
schedule:
# Run every 6 hours
Copilot is powered by AI and may make mistakes. Always verify output.
if ab_config.get("lcp_slowness_percentage_enabled"):
percentage = ab_config.get("lcp_slowness_percentage", 0.0)
# Deterministically assign cohort based on user_id hash
user_hash = int(hashlib.md5(browser_user_id.encode()).hexdigest(), 16)

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI about 15 hours ago

In general, the fix is to avoid MD5 (and other weak hashes like SHA‑1) for hashing identifiers or other sensitive data. For this specific use—deterministic bucketing of a user ID—you can replace MD5 with a stronger, standard hash from the SHA‑2 family such as SHA‑256. The numeric mapping logic (int(hash, 16) % 100) can remain the same, so functionality (deterministic but pseudo-random bucket assignment) is preserved.

Concretely, in accounts_service/accounts_service.py around line 526, change:

user_hash = int(hashlib.md5(browser_user_id.encode()).hexdigest(), 16)

to:

user_hash = int(hashlib.sha256(browser_user_id.encode()).hexdigest(), 16)

No additional imports are required since hashlib is already imported at the top of the file. This is the only necessary code change; all other logic (reading the header, using modulo 100, logging, and returning lcp_delay_ms) should remain as-is to avoid altering existing behavior apart from the internal distribution of users across buckets, which is acceptable for an A/B test.

Suggested changeset 1
accounts_service/accounts_service.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/accounts_service/accounts_service.py b/accounts_service/accounts_service.py
--- a/accounts_service/accounts_service.py
+++ b/accounts_service/accounts_service.py
@@ -522,8 +522,8 @@
                     # Check percentage-based scenario first
                     if ab_config.get("lcp_slowness_percentage_enabled"):
                         percentage = ab_config.get("lcp_slowness_percentage", 0.0)
-                        # Deterministically assign cohort based on user_id hash
-                        user_hash = int(hashlib.md5(browser_user_id.encode()).hexdigest(), 16)
+                        # Deterministically assign cohort based on user_id hash using a strong hash function
+                        user_hash = int(hashlib.sha256(browser_user_id.encode()).hexdigest(), 16)
                         if (user_hash % 100) < percentage:
                             lcp_delay_ms = ab_config.get("lcp_slowness_percentage_delay_ms", 0)
                             logging.info(f"[Browser User] User {browser_user_id} assigned to SLOW LCP cohort via PERCENTAGE ({lcp_delay_ms}ms delay)")
EOF
@@ -522,8 +522,8 @@
# Check percentage-based scenario first
if ab_config.get("lcp_slowness_percentage_enabled"):
percentage = ab_config.get("lcp_slowness_percentage", 0.0)
# Deterministically assign cohort based on user_id hash
user_hash = int(hashlib.md5(browser_user_id.encode()).hexdigest(), 16)
# Deterministically assign cohort based on user_id hash using a strong hash function
user_hash = int(hashlib.sha256(browser_user_id.encode()).hexdigest(), 16)
if (user_hash % 100) < percentage:
lcp_delay_ms = ab_config.get("lcp_slowness_percentage_delay_ms", 0)
logging.info(f"[Browser User] User {browser_user_id} assigned to SLOW LCP cohort via PERCENTAGE ({lcp_delay_ms}ms delay)")
Copilot is powered by AI and may make mistakes. Always verify output.
if ab_config.get("lcp_slowness_percentage_enabled"):
percentage = ab_config.get("lcp_slowness_percentage", 0.0)
# Deterministically assign cohort based on user_id hash
user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (id)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI about 15 hours ago

General fix: Replace the use of the broken MD5 hash with a strong, modern hash function. Since this hashing is not for passwords but for deterministic bucketing, using hashlib.sha256 (or any SHA-2/SHA-3 function) is appropriate and requires minimal code change.

Best way in this code: In accounts_service/accounts_service.py, at the line where user_hash is computed, change hashlib.md5(user_id.encode()).hexdigest() to hashlib.sha256(user_id.encode()).hexdigest(). This preserves the same overall “hash-to-int-then-mod-100” pattern and keeps functionality (deterministic mapping of each user_id to a 0–99 bucket) unchanged in spirit, while avoiding the insecure MD5 algorithm. No new imports are needed because hashlib is already imported at the top of the file.

Concrete change:

  • File: accounts_service/accounts_service.py
  • Around line 571, replace:
    • user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
  • With:
    • user_hash = int(hashlib.sha256(user_id.encode()).hexdigest(), 16)

No additional methods or definitions are required; the standard library already provides hashlib.sha256.


Suggested changeset 1
accounts_service/accounts_service.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/accounts_service/accounts_service.py b/accounts_service/accounts_service.py
--- a/accounts_service/accounts_service.py
+++ b/accounts_service/accounts_service.py
@@ -567,8 +567,8 @@
             # Check percentage-based scenario first
             if ab_config.get("lcp_slowness_percentage_enabled"):
                 percentage = ab_config.get("lcp_slowness_percentage", 0.0)
-                # Deterministically assign cohort based on user_id hash
-                user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
+                # Deterministically assign cohort based on user_id hash using a strong hash function
+                user_hash = int(hashlib.sha256(user_id.encode()).hexdigest(), 16)
                 if (user_hash % 100) < percentage:
                     lcp_delay_ms = ab_config.get("lcp_slowness_percentage_delay_ms", 0)
                     logging.info(f"[Browser User] User {user_id} assigned to SLOW LCP cohort via PERCENTAGE ({lcp_delay_ms}ms delay)")
EOF
@@ -567,8 +567,8 @@
# Check percentage-based scenario first
if ab_config.get("lcp_slowness_percentage_enabled"):
percentage = ab_config.get("lcp_slowness_percentage", 0.0)
# Deterministically assign cohort based on user_id hash
user_hash = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
# Deterministically assign cohort based on user_id hash using a strong hash function
user_hash = int(hashlib.sha256(user_id.encode()).hexdigest(), 16)
if (user_hash % 100) < percentage:
lcp_delay_ms = ab_config.get("lcp_slowness_percentage_delay_ms", 0)
logging.info(f"[Browser User] User {user_id} assigned to SLOW LCP cohort via PERCENTAGE ({lcp_delay_ms}ms delay)")
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments