fix(reddit): prevent enrichment timeout from discarding all search results

AI · claude · AI · commit 6b33b0840605 · 2026-03-27T12:28:32.000+11:00
Bug: When ScrapeCreators comment enrichment was slow (fetching top
comments for 5 posts), the entire search_and_enrich() function could
exceed the caller's 90-second ThreadPoolExecutor timeout.  This raised
a TimeoutError in last30days.py's result-collection phase, which left
reddit_items as its initialized empty list [] — silently discarding
all search results even though the search phase had already completed
successfully (e.g. 116 posts found and deduped).

The user would see contradictory output:
  [Reddit] Final: 116 Reddit posts        ← logged during search
  [Reddit] Enriching comments for 5 posts  ← enrichment starts
  ✗ Error: Reddit search timed out after 90s  ← timeout fires
  ✓ Reddit Found 0 threads                 ← all results gone

Root cause: search_and_enrich() ran enrichment synchronously in the
same call stack as the search.  There was no timeout boundary between
the two phases, so a slow enrichment consumed the entire budget and
the caller's future.result(timeout=90) would fire before the function
could return.

Fix: Run enrich_with_comments() in a dedicated thread with its own
45-second timeout (leaving ~45s for the search phase within the
caller's 90s window).  If enrichment times out or raises:
  - Search results are preserved and returned WITHOUT comments
  - A descriptive log message is emitted
  - The caller's future.result() succeeds normally

The fix is backward-compatible:
  - New enrich_timeout param defaults to 45s (no caller changes needed)
  - enrich_timeout=0 skips enrichment entirely (useful for testing)
  - Successful enrichment behaves identically to before

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/scripts/lib/reddit.py b/scripts/lib/reddit.py
@@ -584,27 +584,58 @@ def search_and_enrich(
     to_date: str,
     depth: str = "default",
     token: str = None,
+    enrich_timeout: int = 45,
 ) -> Dict[str, Any]:
     """Full Reddit pipeline: search + comment enrichment.
 
-    This is the convenience function that does everything.
+    Search results are always returned even if comment enrichment times
+    out or fails.  Previously, the enrichment phase ran synchronously
+    inside this function.  When fetching comments for the top posts was
+    slow (e.g. ScrapeCreators latency spikes), the entire function
+    could exceed the caller's 90-second future timeout, causing a
+    TimeoutError that discarded *all* search results — even though the
+    search itself had already completed successfully.
+
+    The fix runs enrichment in a dedicated thread with its own timeout
+    (default 45 s), leaving ~45 s of budget for the search phase within
+    the caller's 90-second window.  If enrichment times out or raises,
+    the search results are returned without comments rather than being
+    thrown away.
 
     Args:
         topic: Search topic
         from_date: Start date (YYYY-MM-DD)
         to_date: End date (YYYY-MM-DD)
         depth: 'quick', 'default', or 'deep'
         token: ScrapeCreators API key
+        enrich_timeout: Max seconds for the comment-enrichment phase
+            (default 45).  Set to 0 to skip enrichment entirely.
 
     Returns:
-        Dict with 'items' list. Items include top_comments and comment_insights.
+        Dict with 'items' list. Items include top_comments and
+        comment_insights when enrichment succeeds; plain search
+        results otherwise.
     """
     result = search_reddit(topic, from_date, to_date, depth, token)
     items = result.get("items", [])
 
-    if items and token:
-        items = enrich_with_comments(items, token, depth)
-        result["items"] = items
+    if items and token and enrich_timeout > 0:
+        try:
+            from concurrent.futures import ThreadPoolExecutor, TimeoutError as FutureTimeout
+            with ThreadPoolExecutor(max_workers=1) as pool:
+                future = pool.submit(enrich_with_comments, items, token, depth)
+                items = future.result(timeout=enrich_timeout)
+                result["items"] = items
+        except FutureTimeout:
+            _log(
+                f"Comment enrichment timed out after {enrich_timeout}s "
+                f"— returning {len(items)} posts without comments"
+            )
+        except Exception as e:
+            _log(
+                f"Comment enrichment failed ({type(e).__name__}: {e}) "
+                f"— returning {len(items)} posts without comments"
+            )
 
     return result