Skip to content

fix(Outreach): stale scraper_results.csv can be reused after scraper failure or timeout #256

@LocNguyenSGU

Description

@LocNguyenSGU

Summary

Outreach can reuse stale .mp/scraper_results.csv data from a previous run when the current scraper execution fails or times out without producing a fresh output file.

Because the code only checks whether the results file exists, an old file can be treated as if it belongs to the current run.

Current Behavior

Outreach.start() writes scraper output to a fixed path:

  • .mp/scraper_results.csv

After running the scraper, it only checks:

if not os.path.exists(output_path):
    ...

If that file already exists from a previous successful run, the code proceeds to read it and continues the email flow, even if the current scraper run failed or did not generate new output.

Expected Behavior

The outreach flow should only process scraper results created by the current run.

Possible fixes:

  • delete any previous results file before starting the scraper
  • verify the output file was newly created or modified during the current run
  • write each run to a unique temp file and only promote it on success

Evidence

Relevant code paths:

  • src/cache.py
    • get_results_cache_path() always returns a fixed path: .mp/scraper_results.csv
  • src/classes/Outreach.py
    • runs scraper with -results "{output_path}"
    • only checks os.path.exists(output_path) before reading results
    • does not clear old output before the run
    • does not validate modification time or freshness of the file

Reproduction

  1. Run Outreach successfully once so .mp/scraper_results.csv exists.
  2. Start Outreach again.
  3. Make the current scraper run fail or time out before writing a new results file.
  4. Observe that the app still finds .mp/scraper_results.csv and proceeds to process/send emails using old data.

Why this matters

This can cause outreach emails to be sent to stale leads from a previous run, which is especially risky because the user may believe the current scrape succeeded.

Duplicate Check

  • Open issues checked: no matching issue found for stale previous-run results reuse
  • Closed issues checked: no matching issue found
  • Recent PRs checked: no matching PR found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions