Skip to content

Conversation

EddieSheehy
Copy link

@EddieSheehy EddieSheehy commented Oct 6, 2025

Summary by CodeRabbit

  • New Features
    • Clearer logging for request attempts, connection issues, and unexpected errors with attempt counts.
    • Warnings when non-serializable settings are excluded, including a list of keys and guidance on suppressing via a configuration option.
  • Bug Fixes
    • Prevented crashes by skipping non-serializable data during requests and settings collection.
    • Early return on JSON serialization errors to avoid unnecessary retries and noisy failures.

Copy link

coderabbitai bot commented Oct 6, 2025

Walkthrough

Adds logging and error-handling to SOPSRequest.post, including early return on JSON serialization errors and clearer retry logging. Updates get_settings to include only JSON‑serializable spider settings, logging excluded keys and introducing a helper for serializability checks.

Changes

Cohort / File(s) Summary of Changes
HTTP request logging & retries
scrapeops_scrapy/core/api.py
Added local logger; renamed retry loop variable; logs JSON serialization errors and returns early; logs connection errors with attempt count; logs unexpected errors with attempt count; preserves retry behavior.
Settings serialization filter & logging
scrapeops_scrapy/core/model.py
get_settings now filters spider settings by JSON serializability via new _is_serializable helper; collects and logs excluded keys; omits non-serializable settings; guidance to suppress warning with SCRAPEOPS_SETTINGS_EXCLUSION_LIST.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant SOPSRequest
  participant Logger
  participant HTTP as HTTP Client

  Caller->>SOPSRequest: post(url, payload, retries)
  rect rgba(230,245,255,0.6)
    note over SOPSRequest,Logger: New: Early JSON serialization validation/logging
    SOPSRequest->>SOPSRequest: json.dumps(payload)
    alt JSON not serializable
      SOPSRequest-->>Logger: error("JSON serialization failed")
      SOPSRequest-->>Caller: return None
    else JSON ok
      loop attempts
        SOPSRequest->>HTTP: POST
        alt Connection error
          SOPSRequest-->>Logger: warn("Connection error on attempt i")
        else Other unexpected error
          SOPSRequest-->>Logger: error("Unexpected error on attempt i")
        else Success
          HTTP-->>SOPSRequest: Response
          SOPSRequest-->>Caller: return Response
        end
      end
    end
  end
Loading
sequenceDiagram
  autonumber
  participant Spider
  participant Model as get_settings
  participant Logger

  Spider->>Model: get_settings(spider)
  rect rgba(240,255,240,0.6)
    note over Model,Logger: New: Filter settings by JSON serializability
    Model->>Model: Iterate spider.settings
    alt value JSON-serializable
      Model->>Model: include in spider_settings
    else not serializable
      Model->>Model: collect key in exclusion list
    end
    opt any exclusions
      Model-->>Logger: warn("Excluded non-serializable settings: ...")
    end
    Model-->>Spider: spider_settings (filtered)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The provided title “Code to Review Customer Ticket” is generic and does not convey the main changes in the pull request, which introduce JSON serialization error handling, logging enhancements, and filtering of non-serializable settings. As a result, it is too vague for a teammate to quickly understand the purpose of this changeset. Please update the title to concisely reflect the core changes, for example “Add JSON serialization error logging in SOPSRequest” or “Filter non-serializable spider settings with warnings.”
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
scrapeops_scrapy/core/api.py (1)

201-203: Prefer a module-level logger over per-call setup.

Initialize logger = logging.getLogger(__name__) once at module scope and reuse it to avoid repeated lookups and ease testing.

Example (outside this range):

# At top of module
import logging
logger = logging.getLogger(__name__)

Then remove these lines from post().

-        import logging
-        logger = logging.getLogger(__name__)
scrapeops_scrapy/core/model.py (2)

219-221: Move logger to module scope.

Define logger = logging.getLogger(__name__) at the top and reuse here to avoid per-call setup.

Example (outside this range):

# top of module
import logging
logger = logging.getLogger(__name__)

Then remove these lines from get_settings().

-        import logging
-        logger = logging.getLogger(__name__)

248-255: Minor: prefer try/except/else for clarity.

Return True in an else: block to make the happy path explicit.

-        try:
-            json.dumps(value)
-            return True
-        except (TypeError, ValueError):
-            return False
+        try:
+            json.dumps(value)
+        except (TypeError, ValueError):
+            return False
+        else:
+            return True
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d38824 and 877f9e0.

📒 Files selected for processing (2)
  • scrapeops_scrapy/core/api.py (2 hunks)
  • scrapeops_scrapy/core/model.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scrapeops_scrapy/core/api.py (2)
scrapeops_scrapy/normalizer/proxies.py (3)
  • ProxyNormalizer (11-197)
  • unknown_proxy_scheme (92-95)
  • get_proxy_scheme (85-89)
scrapeops_scrapy/exceptions.py (1)
  • ScrapeOpsAPIResponseError (13-16)
🪛 Ruff (0.13.3)
scrapeops_scrapy/core/model.py

253-253: Consider moving this statement to an else block

(TRY300)

scrapeops_scrapy/core/api.py

220-226: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


236-236: Do not catch blind exception: Exception

(BLE001)


237-237: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (2)
scrapeops_scrapy/core/model.py (1)

227-247: Good safeguard: filter to JSON‑serializable settings and warn once.

This prevents payload failures and guides users to the exclusion list. LGTM.

scrapeops_scrapy/core/api.py (1)

200-210: SOPSRequest.post never receives non-None files All callsites either omit the files parameter or explicitly pass files=None, so json+files combination cannot occur.

Comment on lines +201 to +203
import logging
logger = logging.getLogger(__name__)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Differentiate serialization vs response JSON decode errors; narrow exceptions and include tracebacks.

Catching (TypeError, ValueError) here also swallows response.json() decode failures, causing premature no-retry. Handle json.JSONDecodeError separately (retry), keep early-return only for payload serialization errors, add Timeout/RequestException, and use logger.exception for tracebacks.

Apply:

@@
-        import logging
-        logger = logging.getLogger(__name__)
+        import logging
+        import json
+        logger = logging.getLogger(__name__)
@@
-            except (TypeError, ValueError) as json_error:
-                # These are JSON serialization errors - don't retry, log and return immediately
-                logger.error(
-                    "ScrapeOps: Unable to send monitoring data due to non-serializable settings. "
-                    "Some of your Scrapy settings contain functions or other objects that cannot be "
-                    "converted to JSON format. To resolve this, add the problematic setting names "
-                    "to SCRAPEOPS_SETTINGS_EXCLUSION_LIST in your settings.py file. "
-                    f"Error details: {json_error}"
-                )
-                error = json_error
-                return None, str(error)  # Don't retry on serialization errors
+            except json.JSONDecodeError as parse_error:
+                # Invalid JSON in response from server → retry
+                logger.warning(f"ScrapeOps: Invalid JSON response (attempt {attempt+1}): {parse_error}")
+                error = parse_error
+                continue
+            except (TypeError, ValueError) as serialization_error:
+                # Payload JSON serialization error → don't retry
+                logger.exception(
+                    "ScrapeOps: Unable to send monitoring data due to non-serializable settings. "
+                    "Add the problematic setting names to SCRAPEOPS_SETTINGS_EXCLUSION_LIST in your settings.py. "
+                    f"Error details: {serialization_error}"
+                )
+                return None, str(serialization_error)
+            except requests.exceptions.Timeout as e:
+                logger.warning(f"ScrapeOps: Timeout (attempt {attempt+1}): {e}")
+                error = e
+                continue
             except requests.exceptions.ConnectionError as e:
                 logger.warning(f"ScrapeOps: Connection error (attempt {attempt+1}): {e}")
                 error = e
                 continue
             except ScrapeOpsAPIResponseError as e:
                 error = e
                 continue
-            except Exception as e:
-                logger.error(f"ScrapeOps: Unexpected error (attempt {attempt+1}): {e}")
-                error = e
-                continue
+            except requests.exceptions.RequestException as e:
+                logger.exception(f"ScrapeOps: Request error (attempt {attempt+1}): {e}")
+                error = e
+                continue

Also consider converting the error log at Line 237 to logger.exception for tracebacks (addressed above). Based on static analysis hints.

Also applies to: 218-239

🤖 Prompt for AI Agents
In scrapeops_scrapy/core/api.py around lines 201-203 and 218-239, narrow broad
exception handling so payload serialization errors (TypeError, ValueError)
remain an early-return, but handle response.json() failures
(json.JSONDecodeError) separately and treat them as retryable (do not
early-return); also add explicit catches for requests.Timeout and
requests.RequestException where appropriate and treat them as retryable. Replace
simple logger.error calls (notably around line 237) with logger.exception to
include tracebacks, and ensure json (for JSONDecodeError) and requests
exceptions are imported/used accordingly. Ensure each except block only handles
the intended error class and logs context+traceback via logger.exception where
debugging info is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant