Skip to content

Conversation

@Alexander-Cairns
Copy link
Contributor

@Alexander-Cairns Alexander-Cairns commented Oct 21, 2025

Fixes an issue where a bad user agent that is a substring of a good bot was
blocking the good bot. Fixed by verifying with regex and word boundaries.

Also fixed:

  • An issue where the blocked agent was not being reported properly
  • The reported time included the service's response time, now it only contains the request checking time.

@Alexander-Cairns Alexander-Cairns added the patch Backwards compatible bug fixes. label Oct 21, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

Walkthrough

This PR refactors the shouldBlockAgent method to return granular information—a blocked boolean, the matched bad agent string, and a potential error—instead of a simple boolean. It introduces regex-based verification for candidate matches, improves error handling in ServeHTTP, adjusts logging to include quoted values, and updates tests accordingly.

Changes

Cohort / File(s) Summary
Core bot-blocker implementation
botblocker.go
Updated shouldBlockAgent signature to return (bool, string, error) with blocked status, matched agent name, and error. Added regex verification step for candidate agents. Adjusted ServeHTTP logging to quote IP/agent values, timer invocation on error and block paths, and error handling to return internal server error on agent-check failures.
Bot-blocker tests
botblocker_test.go
Updated test calls and assertions to handle new tri-state return value (blocked, blockedAgent, err). Added TestShouldAllowUserAgentSubstring to verify substring-match scenarios return blocked=false and blockedAgent="". Expanded test setup for error and matched-agent validation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ServeHTTP
    participant shouldBlockAgent
    participant Handler
    
    Client->>ServeHTTP: Request with User-Agent header
    
    Note over ServeHTTP: Extract IP & User-Agent
    
    ServeHTTP->>shouldBlockAgent: userAgent string
    
    alt Agent matches block-list
        shouldBlockAgent->>shouldBlockAgent: Regex verification
        shouldBlockAgent-->>ServeHTTP: (true, "matched-agent", nil)
        Note over ServeHTTP: Invoke timer on block
        Note over ServeHTTP: Log blocked request with quoted values
        ServeHTTP-->>Client: Block response
    else Agent does not match
        shouldBlockAgent-->>ServeHTTP: (false, "", nil)
        Note over ServeHTTP: Invoke timer before delegating
        ServeHTTP->>Handler: Delegate to next handler
        Handler-->>Client: Continue processing
    else Error during check
        shouldBlockAgent-->>ServeHTTP: (_, "", error)
        Note over ServeHTTP: Invoke timer on error
        ServeHTTP-->>Client: Internal Server Error (500)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes involve a method signature update affecting the public API contract, new error handling paths, regex verification logic, and corresponding test adaptations. While cohesive to a single feature area, the modifications span logic changes, error semantics, and logging adjustments requiring careful review of each concern.

Possibly related PRs

  • Fix: stalled response #9: Modifies ServeHTTP timing and logging flow, sharing overlapping code-level changes with this PR's timer invocation and logging adjustments.

Suggested reviewers

  • lutaylor

Poem

🐰 A bot's bad disguise now meets its demise,
With regex so keen and a triple-response rise—
Matched agents exposed with precision and care,
Errors handled gracefully, blocking is fair! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "IT-692: Stop false positives on substrings" directly addresses the primary objective of the pull request, which is to prevent substring matches of bad user agents from incorrectly blocking legitimate bots. The PR's implementation—adding regex verification with word boundaries to the shouldBlockAgent method—is a technical means to solve this problem, and the title appropriately captures the problem statement rather than implementation details. The title is concise, specific, and would clearly communicate to a teammate reviewing the git history that this PR addresses false positive blocks caused by substring matching.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch IT-692

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • JIRA integration encountered authorization issues. Please disconnect and reconnect the integration in the CodeRabbit UI.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
botblocker.go (3)

202-207: Consider fail-open on UA-check error to avoid availability impact.

A malformed blocklist entry would currently yield 500 for requests. Prefer logging the error and allowing the request to proceed.

Optional diff:

-    if err != nil {
-        timer()
-        http.Error(rw, "internal error", http.StatusInternalServerError)
-        return
-    }
+    if err != nil {
+        log.Errorf("user-agent check failed: %v", err)
+        timer()
+        // Fail-open: continue the request pipeline
+        b.next.ServeHTTP(rw, req)
+        return
+    }

195-196: Tighten log message: grammar + quoting.

Minor cleanup: fix wording and use %q for quoting.

-        log.Infof("blocked request with from IP \"%v\"", remoteAddrPort.Addr())
+        log.Infof("blocked request from IP %q", remoteAddrPort.Addr().String())

208-211: Clarify UA log and use %q.

“contained” is misleading with boundary matching; prefer “matched”.

-    log.Infof("blocked request with user agent \"%v\" because it contained \"%v\"", agent, badAgent)
+    log.Infof("blocked request with user agent %q matched bad-agent %q", agent, badAgent)
botblocker_test.go (2)

168-170: Fix mismatched failure message.

The assertion checks for blockedAgent == badAgent, but the message says “want ""”.

-    if blockedAgent != badAgent {
-        t.Fatalf("botBlocker.shouldBlockAgent(%s) = %s; want \"\"", requestAgent, blockedAgent)
-    }
+    if blockedAgent != badAgent {
+        t.Fatalf("botBlocker.shouldBlockAgent(%s) blockedAgent=%q; want %q", requestAgent, blockedAgent, badAgent)
+    }

193-211: Good regression test for substring false positives.

This guards the original bug. Consider also adding a case where the blocklist entry contains regex metacharacters (e.g., curl/7.64) to ensure escaping works.

Example additional test:

func TestShouldBlockUserAgentWithRegexMeta(t *testing.T) {
    botBlocker := BotBlocker{
        userAgentBlockList: []string{"curl/7.64"},
    }
    ua := "curl/7.64.1"
    blocked, bad, err := botBlocker.shouldBlockAgent(ua)
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    if blocked {
        t.Fatalf("should not block %q by %q with word boundaries", ua, bad)
    }
    ua2 := "curl/7.64"
    blocked, bad, err = botBlocker.shouldBlockAgent(ua2)
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    if !blocked || bad != "curl/7.64" {
        t.Fatalf("expected block of %q by %q", ua2, bad)
    }
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between da43f03 and 80078ae.

📒 Files selected for processing (2)
  • botblocker.go (3 hunks)
  • botblocker_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
botblocker_test.go (1)
botblocker.go (1)
  • BotBlocker (33-40)
botblocker.go (1)
utils/log.go (1)
  • Infof (99-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (1.20)
🔇 Additional comments (4)
botblocker.go (3)

190-193: LGTM on timer coverage for error path.

Calling timer() before returning on parse error is correct.


215-216: LGTM on final timer before delegating.

Accurately measures only the blocker check, not downstream handler time.


234-241: Fix critical regex compilation error and correctness issue: (?:...) unsupported in Go RE2 and badAgent unescaped.

Verification confirms the issues:

  1. Go RE2 does not support (?:...) non-capturing groups — the pattern at line 234 will fail with a regex parse error at runtime, triggering 500s.
  2. badAgent is unescaped — if list entries contain regex metacharacters (., *, +, [, ], etc.), they will be interpreted as regex syntax instead of literal text, breaking the matching logic.
  3. Error format verb %e is invalid — should use %w for error wrapping.

Only one regex pattern exists in the file (line 234), and it has all these issues.

Apply this diff:

-            // verify with regex
-            pattern := fmt.Sprintf(`(?:\b)%s(?:\b)`, badAgent)
-            matched, err := regexp.Match(pattern, []byte(userAgent))
-            if err != nil {
-                return false, "", fmt.Errorf("failed to check user agent %s: %e", userAgent, err)
-            }
+            // verify with regex word boundaries; escape list entry to avoid regex meta
+            pattern := `\b` + regexp.QuoteMeta(badAgent) + `\b`
+            matched, err := regexp.MatchString(pattern, userAgent)
+            if err != nil {
+                return false, "", fmt.Errorf("failed to check user agent %q: %w", userAgent, err)
+            }

Likely an incorrect or invalid review comment.

botblocker_test.go (1)

181-191: Tests align with new API; assertions read well.

Covers allow-path and verifies empty badAgent.

@adamcbowman adamcbowman merged commit 0bebd73 into main Oct 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Backwards compatible bug fixes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants