IT-692: Stop false positives on substrings #10

Alexander-Cairns · 2025-10-21T17:56:24Z

Fixes an issue where a bad user agent that is a substring of a good bot was
blocking the good bot. Fixed by verifying with regex and word boundaries.

Also fixed:

An issue where the blocked agent was not being reported properly
The reported time included the service's response time, now it only contains the request checking time.

coderabbitai · 2025-10-21T17:56:50Z

Walkthrough

This PR refactors the shouldBlockAgent method to return granular information—a blocked boolean, the matched bad agent string, and a potential error—instead of a simple boolean. It introduces regex-based verification for candidate matches, improves error handling in ServeHTTP, adjusts logging to include quoted values, and updates tests accordingly.

Changes

Cohort / File(s)	Summary
Core bot-blocker implementation `botblocker.go`	Updated `shouldBlockAgent` signature to return `(bool, string, error)` with blocked status, matched agent name, and error. Added regex verification step for candidate agents. Adjusted `ServeHTTP` logging to quote IP/agent values, timer invocation on error and block paths, and error handling to return internal server error on agent-check failures.
Bot-blocker tests `botblocker_test.go`	Updated test calls and assertions to handle new tri-state return value `(blocked, blockedAgent, err)`. Added `TestShouldAllowUserAgentSubstring` to verify substring-match scenarios return `blocked=false` and `blockedAgent=""`. Expanded test setup for error and matched-agent validation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ServeHTTP
    participant shouldBlockAgent
    participant Handler
    
    Client->>ServeHTTP: Request with User-Agent header
    
    Note over ServeHTTP: Extract IP & User-Agent
    
    ServeHTTP->>shouldBlockAgent: userAgent string
    
    alt Agent matches block-list
        shouldBlockAgent->>shouldBlockAgent: Regex verification
        shouldBlockAgent-->>ServeHTTP: (true, "matched-agent", nil)
        Note over ServeHTTP: Invoke timer on block
        Note over ServeHTTP: Log blocked request with quoted values
        ServeHTTP-->>Client: Block response
    else Agent does not match
        shouldBlockAgent-->>ServeHTTP: (false, "", nil)
        Note over ServeHTTP: Invoke timer before delegating
        ServeHTTP->>Handler: Delegate to next handler
        Handler-->>Client: Continue processing
    else Error during check
        shouldBlockAgent-->>ServeHTTP: (_, "", error)
        Note over ServeHTTP: Invoke timer on error
        ServeHTTP-->>Client: Internal Server Error (500)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes involve a method signature update affecting the public API contract, new error handling paths, regex verification logic, and corresponding test adaptations. While cohesive to a single feature area, the modifications span logic changes, error semantics, and logging adjustments requiring careful review of each concern.

Possibly related PRs

Fix: stalled response #9: Modifies ServeHTTP timing and logging flow, sharing overlapping code-level changes with this PR's timer invocation and logging adjustments.

Suggested reviewers

lutaylor

Poem

🐰 A bot's bad disguise now meets its demise,
With regex so keen and a triple-response rise—
Matched agents exposed with precision and care,
Errors handled gracefully, blocking is fair! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "IT-692: Stop false positives on substrings" directly addresses the primary objective of the pull request, which is to prevent substring matches of bad user agents from incorrectly blocking legitimate bots. The PR's implementation—adding regex verification with word boundaries to the `shouldBlockAgent` method—is a technical means to solve this problem, and the title appropriately captures the problem statement rather than implementation details. The title is concise, specific, and would clearly communicate to a teammate reviewing the git history that this PR addresses false positive blocks caused by substring matching.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch IT-692

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)

JIRA integration encountered authorization issues. Please disconnect and reconnect the integration in the CodeRabbit UI.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

botblocker.go (3)

202-207: Consider fail-open on UA-check error to avoid availability impact.

A malformed blocklist entry would currently yield 500 for requests. Prefer logging the error and allowing the request to proceed.

Optional diff:
-    if err != nil {
-        timer()
-        http.Error(rw, "internal error", http.StatusInternalServerError)
-        return
-    }
+    if err != nil {
+        log.Errorf("user-agent check failed: %v", err)
+        timer()
+        // Fail-open: continue the request pipeline
+        b.next.ServeHTTP(rw, req)
+        return
+    }
195-196: Tighten log message: grammar + quoting.

Minor cleanup: fix wording and use %q for quoting.
-        log.Infof("blocked request with from IP \"%v\"", remoteAddrPort.Addr())
+        log.Infof("blocked request from IP %q", remoteAddrPort.Addr().String())
208-211: Clarify UA log and use %q.

“contained” is misleading with boundary matching; prefer “matched”.
-    log.Infof("blocked request with user agent \"%v\" because it contained \"%v\"", agent, badAgent)
+    log.Infof("blocked request with user agent %q matched bad-agent %q", agent, badAgent)

botblocker_test.go (2)

168-170: Fix mismatched failure message.

The assertion checks for blockedAgent == badAgent, but the message says “want ""”.

-    if blockedAgent != badAgent {
-        t.Fatalf("botBlocker.shouldBlockAgent(%s) = %s; want \"\"", requestAgent, blockedAgent)
-    }
+    if blockedAgent != badAgent {
+        t.Fatalf("botBlocker.shouldBlockAgent(%s) blockedAgent=%q; want %q", requestAgent, blockedAgent, badAgent)
+    }

193-211: Good regression test for substring false positives.

This guards the original bug. Consider also adding a case where the blocklist entry contains regex metacharacters (e.g., curl/7.64) to ensure escaping works.

Example additional test:

func TestShouldBlockUserAgentWithRegexMeta(t *testing.T) {
    botBlocker := BotBlocker{
        userAgentBlockList: []string{"curl/7.64"},
    }
    ua := "curl/7.64.1"
    blocked, bad, err := botBlocker.shouldBlockAgent(ua)
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    if blocked {
        t.Fatalf("should not block %q by %q with word boundaries", ua, bad)
    }
    ua2 := "curl/7.64"
    blocked, bad, err = botBlocker.shouldBlockAgent(ua2)
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
    if !blocked || bad != "curl/7.64" {
        t.Fatalf("expected block of %q by %q", ua2, bad)
    }
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between da43f03 and 80078ae.

📒 Files selected for processing (2)

botblocker.go (3 hunks)
botblocker_test.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

botblocker_test.go (1)

botblocker.go (1)

BotBlocker (33-40)

botblocker.go (1)

utils/log.go (1)

Infof (99-103)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (1.20)

🔇 Additional comments (4)

botblocker.go (3)
190-193: LGTM on timer coverage for error path.

Calling timer() before returning on parse error is correct.

215-216: LGTM on final timer before delegating.

Accurately measures only the blocker check, not downstream handler time.

234-241: Fix critical regex compilation error and correctness issue: (?:...) unsupported in Go RE2 and badAgent unescaped.

Verification confirms the issues:

Go RE2 does not support (?:...) non-capturing groups — the pattern at line 234 will fail with a regex parse error at runtime, triggering 500s.

badAgent is unescaped — if list entries contain regex metacharacters (., *, +, [, ], etc.), they will be interpreted as regex syntax instead of literal text, breaking the matching logic.

Error format verb %e is invalid — should use %w for error wrapping.

Only one regex pattern exists in the file (line 234), and it has all these issues.

Apply this diff:
-            // verify with regex
-            pattern := fmt.Sprintf(`(?:\b)%s(?:\b)`, badAgent)
-            matched, err := regexp.Match(pattern, []byte(userAgent))
-            if err != nil {
-                return false, "", fmt.Errorf("failed to check user agent %s: %e", userAgent, err)
-            }
+            // verify with regex word boundaries; escape list entry to avoid regex meta
+            pattern := `\b` + regexp.QuoteMeta(badAgent) + `\b`
+            matched, err := regexp.MatchString(pattern, userAgent)
+            if err != nil {
+                return false, "", fmt.Errorf("failed to check user agent %q: %w", userAgent, err)
+            }
Likely an incorrect or invalid review comment.
botblocker_test.go (1)

181-191: Tests align with new API; assertions read well.

Covers allow-path and verifies empty badAgent.

Alexander-Cairns added 4 commits October 20, 2025 13:53

fix: log blocked agent

2be6c3a

fix: quote around variables in logs

ce923f1

fix: double check with regex word boundaries

93a0491

fix: only measure time to check and not web server responce

80078ae

Alexander-Cairns added the patch Backwards compatible bug fixes. label Oct 21, 2025

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

adamcbowman approved these changes Oct 22, 2025

View reviewed changes

adamcbowman merged commit 0bebd73 into main Oct 29, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IT-692: Stop false positives on substrings #10

IT-692: Stop false positives on substrings #10

Uh oh!

Alexander-Cairns commented Oct 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IT-692: Stop false positives on substrings #10

IT-692: Stop false positives on substrings #10

Uh oh!

Conversation

Alexander-Cairns commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Review ran into problems

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alexander-Cairns commented Oct 21, 2025 •

edited

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading