How to gracefully stop spider from CrawlerProcess after N consecutive timeouts?

**Title:** How to gracefully stop spider from CrawlerProcess after N consecutive timeouts?

Hi, bro, how´s It going? Thanks for the project!  👋  

I’m using **aio-scrapy** with `CrawlerProcess` and I’m struggling to cleanly stop a spider when the target site starts timing out.

---

### Goal

I’d like to **stop the spider (and end the process)** when there are **5 consecutive timeout errors** (after retries), while running the spider via `CrawlerProcess`.

---

### Environment

- OS: Windows (ProactorEventLoop)
- Python: 3.11 (Anaconda)
- aio-scrapy: (please fill in version)
- Also using: `aiohttp`, `playwright` (for login/credentials), but the problem seems limited to aio-scrapy’s crawler/engine.


### What I’m doing

I have a spider that:

- sends POST requests to a single endpoint
- uses `RETRY_TIMES` and `DOWNLOAD_TIMEOUT`
- counts **consecutive final failures** in an `errback`:

```python
class StockFetcherSpider(Spider):
    custom_settings = {
        "RETRY_TIMES": 1,
        "DOWNLOAD_TIMEOUT": 20,
        "CLOSE_SPIDER_ON_IDLE": True,
        "CONCURRENT_REQUESTS": 8,
    }

    def __init__(self, output_dir, product_codes, credentials, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._response_errors = 0
        self._max_response_errors = 5
        self._stop_flag = False
        self.product_codes = product_codes
        self.COOKIES = credentials["cookies"]
        self.REQUEST_TOKEN = credentials["request_token"]

    async def start_requests(self):
        for code in self.product_codes:
            if self._stop_flag:
                logger.warning("Stop flag set, not scheduling new requests.")
                break

            form_data = {...}

            yield FormRequest(
                url=self.URL,
                formdata=form_data,
                cookies=self.COOKIES,
                callback=self.parse,
                meta={"product_code": code},
                errback=self.errback_request,
            )

    async def errback_request(self, failure):
        self._response_errors += 1
        logger.warning(
            f"Final failure (after retries): {repr(failure)} "
            f"- {self._response_errors}/{self._max_response_errors}"
        )

        if self._response_errors >= self._max_response_errors:
            logger.critical("Too many consecutive failures.")
            # here I tried different ways to stop the spider
```
Runner (simplified):

```python
process = CrawlerProcess()
process.crawl(
    StockFetcherSpider,
    output_dir=output_dir,
    product_codes=product_codes,
    credentials=creds,
)
process.start()
```
### What I tried
1. raise CloseSpider(...) in errback_request

```python
from aioscrapy.exceptions import CloseSpider

if self._response_errors >= self._max_response_errors:
    raise CloseSpider("too_many_consecutive_failures")

```
Logs show:

```
Closing spider (too_many_consecutive_failures)
Dumping aioscrapy stats:
{
  'finish_reason': 'too_many_consecutive_failures',
  ...
}
Spider closed (too_many_consecutive_failures)
```

But after that, the process ends with:

```
RuntimeError: Event loop stopped before Future completed.
Task was destroyed but it is pending!
```

So CloseSpider works in terms of stats, but shutdown is not clean.

------------------------------------------------------------------------------------------------------------------------------

2. self.crawler.engine.close_spider(...) in errback_request

```python
await self.crawler.engine.close_spider(self, "too_many_consecutive_failures")
```

- Without await:
`RuntimeWarning: coroutine 'ExecutionEngine.close_spider' was never awaited`

- With await:
`eventually: AssertionError: assert self.spider is not None inside _spider_idle.`
------------------------------------------------------------------------------------------------------------------------------
3. Soft stop with _stop_flag + `CLOSE_SPIDER_ON_IDLE = True`

- In errback_request, when limit is reached: self._stop_flag = True
- In start_requests, check flag and break
- This stops scheduling new requests, but with many timeouts I can still see the same kind of shutdown noise (RuntimeError: Event loop stopped before Future completed), and sometimes the process doesn’t seem to exit cleanly.
------------------------------------------------------------------------------------------------------------------------------
### Questions
1. What is the recommended way to stop a spider (running under CrawlerProcess) after N consecutive download timeouts?
2. Is the `RuntimeError: Event loop stopped before Future completed` + "Task was destroyed but it is pending" expected when using `CloseSpider` or `engine.close_spider` with `CrawlerProcess`?

Any guidance or a minimal example showing the “correct” pattern for stopping a spider after N consecutive timeouts with `CrawlerProcess` would be really helpful. 🙏

I look forward to your reply.
Best regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to gracefully stop spider from CrawlerProcess after N consecutive timeouts? #10

Goal

Environment

What I’m doing

What I tried

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to gracefully stop spider from CrawlerProcess after N consecutive timeouts? #10

Description

Goal

Environment

What I’m doing

What I tried

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions