-
Notifications
You must be signed in to change notification settings - Fork 13
Code to Review Customer Ticket #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Code to Review Customer Ticket #16
Conversation
WalkthroughAdds logging and error-handling to SOPSRequest.post, including early return on JSON serialization errors and clearer retry logging. Updates get_settings to include only JSON‑serializable spider settings, logging excluded keys and introducing a helper for serializability checks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant SOPSRequest
participant Logger
participant HTTP as HTTP Client
Caller->>SOPSRequest: post(url, payload, retries)
rect rgba(230,245,255,0.6)
note over SOPSRequest,Logger: New: Early JSON serialization validation/logging
SOPSRequest->>SOPSRequest: json.dumps(payload)
alt JSON not serializable
SOPSRequest-->>Logger: error("JSON serialization failed")
SOPSRequest-->>Caller: return None
else JSON ok
loop attempts
SOPSRequest->>HTTP: POST
alt Connection error
SOPSRequest-->>Logger: warn("Connection error on attempt i")
else Other unexpected error
SOPSRequest-->>Logger: error("Unexpected error on attempt i")
else Success
HTTP-->>SOPSRequest: Response
SOPSRequest-->>Caller: return Response
end
end
end
end
sequenceDiagram
autonumber
participant Spider
participant Model as get_settings
participant Logger
Spider->>Model: get_settings(spider)
rect rgba(240,255,240,0.6)
note over Model,Logger: New: Filter settings by JSON serializability
Model->>Model: Iterate spider.settings
alt value JSON-serializable
Model->>Model: include in spider_settings
else not serializable
Model->>Model: collect key in exclusion list
end
opt any exclusions
Model-->>Logger: warn("Excluded non-serializable settings: ...")
end
Model-->>Spider: spider_settings (filtered)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
scrapeops_scrapy/core/api.py (1)
201-203
: Prefer a module-level logger over per-call setup.Initialize
logger = logging.getLogger(__name__)
once at module scope and reuse it to avoid repeated lookups and ease testing.Example (outside this range):
# At top of module import logging logger = logging.getLogger(__name__)Then remove these lines from post().
- import logging - logger = logging.getLogger(__name__)scrapeops_scrapy/core/model.py (2)
219-221
: Move logger to module scope.Define
logger = logging.getLogger(__name__)
at the top and reuse here to avoid per-call setup.Example (outside this range):
# top of module import logging logger = logging.getLogger(__name__)Then remove these lines from get_settings().
- import logging - logger = logging.getLogger(__name__)
248-255
: Minor: prefer try/except/else for clarity.Return True in an
else:
block to make the happy path explicit.- try: - json.dumps(value) - return True - except (TypeError, ValueError): - return False + try: + json.dumps(value) + except (TypeError, ValueError): + return False + else: + return True
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scrapeops_scrapy/core/api.py
(2 hunks)scrapeops_scrapy/core/model.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scrapeops_scrapy/core/api.py (2)
scrapeops_scrapy/normalizer/proxies.py (3)
ProxyNormalizer
(11-197)unknown_proxy_scheme
(92-95)get_proxy_scheme
(85-89)scrapeops_scrapy/exceptions.py (1)
ScrapeOpsAPIResponseError
(13-16)
🪛 Ruff (0.13.3)
scrapeops_scrapy/core/model.py
253-253: Consider moving this statement to an else
block
(TRY300)
scrapeops_scrapy/core/api.py
220-226: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
236-236: Do not catch blind exception: Exception
(BLE001)
237-237: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (2)
scrapeops_scrapy/core/model.py (1)
227-247
: Good safeguard: filter to JSON‑serializable settings and warn once.This prevents payload failures and guides users to the exclusion list. LGTM.
scrapeops_scrapy/core/api.py (1)
200-210
: SOPSRequest.post never receives non-None files All callsites either omit the files parameter or explicitly pass files=None, so json+files combination cannot occur.
import logging | ||
logger = logging.getLogger(__name__) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differentiate serialization vs response JSON decode errors; narrow exceptions and include tracebacks.
Catching (TypeError, ValueError) here also swallows response.json() decode failures, causing premature no-retry. Handle json.JSONDecodeError separately (retry), keep early-return only for payload serialization errors, add Timeout/RequestException, and use logger.exception for tracebacks.
Apply:
@@
- import logging
- logger = logging.getLogger(__name__)
+ import logging
+ import json
+ logger = logging.getLogger(__name__)
@@
- except (TypeError, ValueError) as json_error:
- # These are JSON serialization errors - don't retry, log and return immediately
- logger.error(
- "ScrapeOps: Unable to send monitoring data due to non-serializable settings. "
- "Some of your Scrapy settings contain functions or other objects that cannot be "
- "converted to JSON format. To resolve this, add the problematic setting names "
- "to SCRAPEOPS_SETTINGS_EXCLUSION_LIST in your settings.py file. "
- f"Error details: {json_error}"
- )
- error = json_error
- return None, str(error) # Don't retry on serialization errors
+ except json.JSONDecodeError as parse_error:
+ # Invalid JSON in response from server → retry
+ logger.warning(f"ScrapeOps: Invalid JSON response (attempt {attempt+1}): {parse_error}")
+ error = parse_error
+ continue
+ except (TypeError, ValueError) as serialization_error:
+ # Payload JSON serialization error → don't retry
+ logger.exception(
+ "ScrapeOps: Unable to send monitoring data due to non-serializable settings. "
+ "Add the problematic setting names to SCRAPEOPS_SETTINGS_EXCLUSION_LIST in your settings.py. "
+ f"Error details: {serialization_error}"
+ )
+ return None, str(serialization_error)
+ except requests.exceptions.Timeout as e:
+ logger.warning(f"ScrapeOps: Timeout (attempt {attempt+1}): {e}")
+ error = e
+ continue
except requests.exceptions.ConnectionError as e:
logger.warning(f"ScrapeOps: Connection error (attempt {attempt+1}): {e}")
error = e
continue
except ScrapeOpsAPIResponseError as e:
error = e
continue
- except Exception as e:
- logger.error(f"ScrapeOps: Unexpected error (attempt {attempt+1}): {e}")
- error = e
- continue
+ except requests.exceptions.RequestException as e:
+ logger.exception(f"ScrapeOps: Request error (attempt {attempt+1}): {e}")
+ error = e
+ continue
Also consider converting the error log at Line 237 to logger.exception
for tracebacks (addressed above). Based on static analysis hints.
Also applies to: 218-239
🤖 Prompt for AI Agents
In scrapeops_scrapy/core/api.py around lines 201-203 and 218-239, narrow broad
exception handling so payload serialization errors (TypeError, ValueError)
remain an early-return, but handle response.json() failures
(json.JSONDecodeError) separately and treat them as retryable (do not
early-return); also add explicit catches for requests.Timeout and
requests.RequestException where appropriate and treat them as retryable. Replace
simple logger.error calls (notably around line 237) with logger.exception to
include tracebacks, and ensure json (for JSONDecodeError) and requests
exceptions are imported/used accordingly. Ensure each except block only handles
the intended error class and logs context+traceback via logger.exception where
debugging info is needed.
Summary by CodeRabbit