Skip to content

Conversation

@anitacaron
Copy link
Collaborator

@anitacaron anitacaron commented Sep 13, 2025

Include a User-Agent header in the url_exists function to support proper redirection for RGD hosts.

Fixes #158

Summary by CodeRabbit

  • Bug Fixes
    • Improved reliability of external URL checks by identifying requests with a User-Agent (“OBO Dashboard”). This increases compatibility with sites that require a User-Agent, reducing false negatives and intermittent failures when validating links. No changes to public interfaces or user workflows.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 13, 2025

📝 Walkthrough

Walkthrough

A single change updates url_exists in util/lib.py to include a User-Agent header ("OBO Dashboard") in the requests.head call. Control flow and return behavior remain unchanged: True on HTTP 200, False on exceptions, with error logging.

Changes

Cohort / File(s) Summary of Changes
HTTP request header tweak
util/lib.py
Added headers={"User-Agent": "OBO Dashboard"} to requests.head in url_exists; no other logic or signatures changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add User-Agent header for URL requests" succinctly and accurately summarizes the primary change in util/lib.py—adding a User-Agent header to requests.head to address FP-008 failures for certain RGD-hosted URLs.
Linked Issues Check ✅ Passed The patch implements the fix described in issue #158 by adding headers={"User-Agent": "OBO Dashboard"} to the url_exists requests.head call while preserving allow_redirects and existing behavior, which directly addresses the linked issue's coding objective to stop FP-008 failures caused by blank User-Agent rejections.
Out of Scope Changes Check ✅ Passed The changes are limited to util/lib.py (adding a User-Agent header in url_exists) with no modifications to public signatures or unrelated files, so there are no detectable out-of-scope changes.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch anitacaron/issue158

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@anitacaron anitacaron self-assigned this Sep 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
util/lib.py (1)

572-579: Add timeouts, broaden success criteria, and fallback to GET when HEAD is unsupported.

Prevents hangs (addresses Ruff S113), avoids false negatives on non-200 successes, and handles servers that return 405/501 for HEAD.

Apply:

 try:
-        with requests.head(url, allow_redirects=True, headers={"User-Agent": "OBO Dashboard"}) as res:
-            return (res.status_code == 200)
-    except Exception as e:
-        # Any errors with connection will be considered
-        # as the URL not existing
-        logging.error(e, exc_info=True)
+        headers = {"User-Agent": "OBO Dashboard"}
+        with requests.head(url, allow_redirects=True, headers=headers, timeout=(5, 10)) as res:
+            # Fallback to GET when HEAD not allowed/implemented
+            if res.status_code in (405, 501):
+                with requests.get(url, allow_redirects=True, headers=headers, timeout=(5, 10), stream=True) as res2:
+                    return 200 <= res2.status_code < 400
+            return 200 <= res.status_code < 400
+    except RequestException as e:
+        logging.error("url_exists(%s) failed: %s", url, e, exc_info=True)
     return False
🧹 Nitpick comments (1)
util/lib.py (1)

568-579: Optional: make UA identifiable and reusable.

Consider a module-level constant or session (includes version/contact), then reuse everywhere.

Example:

OBO_USER_AGENT = "OBO-Dashboard (+https://github.com/OBOFoundry/OBO-Dashboard)"
SESSION = requests.Session()
SESSION.headers.update({"User-Agent": OBO_USER_AGENT})
# Then: SESSION.head(...), SESSION.get(...)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d58e537 and fe18b29.

📒 Files selected for processing (1)
  • util/lib.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
util/lib.py

573-573: Probable use of requests call without timeout

(S113)

🔇 Additional comments (2)
util/lib.py (2)

573-575: User-Agent addition is correct and aligned with the PR goal.

This should resolve the RGD redirect/blank-UA issue while retaining redirects.


270-287: Propagate User-Agent header and timeouts to HTTP calls

Add a User-Agent header and connect/read timeouts to requests.head in base_url_if_exists; apply the same pattern to urllib.request.urlopen wrappers (read_txt_from_url_as_lines, open_yaml_from_url).

-            ret = requests.head(ourl, allow_redirects=True)
+            ret = requests.head(ourl, allow_redirects=True, headers={"User-Agent": "OBO Dashboard"}, timeout=(5, 10))

Repo-wide search was inconclusive (ripgrep reported "No files were searched"); verify other requests.(head|get) occurrences and add the same headers/timeouts.

@anitacaron anitacaron merged commit 64a7b03 into master Sep 13, 2025
2 checks passed
@anitacaron anitacaron deleted the anitacaron/issue158 branch September 15, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FP-008 unexpectedly failing for several RGD-hosted ontologies

3 participants