Skip to content

Commit 6b33b08

Browse files
AIclaude
authored andcommitted
fix(reddit): prevent enrichment timeout from discarding all search results
Bug: When ScrapeCreators comment enrichment was slow (fetching top comments for 5 posts), the entire search_and_enrich() function could exceed the caller's 90-second ThreadPoolExecutor timeout. This raised a TimeoutError in last30days.py's result-collection phase, which left reddit_items as its initialized empty list [] — silently discarding all search results even though the search phase had already completed successfully (e.g. 116 posts found and deduped). The user would see contradictory output: [Reddit] Final: 116 Reddit posts ← logged during search [Reddit] Enriching comments for 5 posts ← enrichment starts ✗ Error: Reddit search timed out after 90s ← timeout fires ✓ Reddit Found 0 threads ← all results gone Root cause: search_and_enrich() ran enrichment synchronously in the same call stack as the search. There was no timeout boundary between the two phases, so a slow enrichment consumed the entire budget and the caller's future.result(timeout=90) would fire before the function could return. Fix: Run enrich_with_comments() in a dedicated thread with its own 45-second timeout (leaving ~45s for the search phase within the caller's 90s window). If enrichment times out or raises: - Search results are preserved and returned WITHOUT comments - A descriptive log message is emitted - The caller's future.result() succeeds normally The fix is backward-compatible: - New enrich_timeout param defaults to 45s (no caller changes needed) - enrich_timeout=0 skips enrichment entirely (useful for testing) - Successful enrichment behaves identically to before Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4d6224f commit 6b33b08

File tree

1 file changed

+36
-5
lines changed

1 file changed

+36
-5
lines changed

scripts/lib/reddit.py

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -584,27 +584,58 @@ def search_and_enrich(
584584
to_date: str,
585585
depth: str = "default",
586586
token: str = None,
587+
enrich_timeout: int = 45,
587588
) -> Dict[str, Any]:
588589
"""Full Reddit pipeline: search + comment enrichment.
589590
590-
This is the convenience function that does everything.
591+
Search results are always returned even if comment enrichment times
592+
out or fails. Previously, the enrichment phase ran synchronously
593+
inside this function. When fetching comments for the top posts was
594+
slow (e.g. ScrapeCreators latency spikes), the entire function
595+
could exceed the caller's 90-second future timeout, causing a
596+
TimeoutError that discarded *all* search results — even though the
597+
search itself had already completed successfully.
598+
599+
The fix runs enrichment in a dedicated thread with its own timeout
600+
(default 45 s), leaving ~45 s of budget for the search phase within
601+
the caller's 90-second window. If enrichment times out or raises,
602+
the search results are returned without comments rather than being
603+
thrown away.
591604
592605
Args:
593606
topic: Search topic
594607
from_date: Start date (YYYY-MM-DD)
595608
to_date: End date (YYYY-MM-DD)
596609
depth: 'quick', 'default', or 'deep'
597610
token: ScrapeCreators API key
611+
enrich_timeout: Max seconds for the comment-enrichment phase
612+
(default 45). Set to 0 to skip enrichment entirely.
598613
599614
Returns:
600-
Dict with 'items' list. Items include top_comments and comment_insights.
615+
Dict with 'items' list. Items include top_comments and
616+
comment_insights when enrichment succeeds; plain search
617+
results otherwise.
601618
"""
602619
result = search_reddit(topic, from_date, to_date, depth, token)
603620
items = result.get("items", [])
604621

605-
if items and token:
606-
items = enrich_with_comments(items, token, depth)
607-
result["items"] = items
622+
if items and token and enrich_timeout > 0:
623+
try:
624+
from concurrent.futures import ThreadPoolExecutor, TimeoutError as FutureTimeout
625+
with ThreadPoolExecutor(max_workers=1) as pool:
626+
future = pool.submit(enrich_with_comments, items, token, depth)
627+
items = future.result(timeout=enrich_timeout)
628+
result["items"] = items
629+
except FutureTimeout:
630+
_log(
631+
f"Comment enrichment timed out after {enrich_timeout}s "
632+
f"— returning {len(items)} posts without comments"
633+
)
634+
except Exception as e:
635+
_log(
636+
f"Comment enrichment failed ({type(e).__name__}: {e}) "
637+
f"— returning {len(items)} posts without comments"
638+
)
608639

609640
return result
610641

0 commit comments

Comments
 (0)