Skip to content

1_crawl_site.py error #22

@dnielsen

Description

@dnielsen

When I run:

python 1_crawl_site.py --url https://techbeatconference.com/TechBeat/techbeat/

I get the following error

% python 1_crawl_site.py --url https://techbeatconference.com
⚙️ Crawling with URL: https://techbeatconference.com, Max Downloads: 500, Max Depth: 5
✅ Cleared crawl directory: workspace/crawled
⚙️ Starting web crawl...
17:35:00 DEBUG - Received configuration: {'urls': ['https://techbeatconference.com'], 'depth': 5, 'downloads': 500, 'folder': 'workspace/crawled'} at "/opt/anaconda3/envs/allycat-1/lib/python3.11/site-packages/dpk_web2parquet/transform.py:46"
DEBUG:dpk_web2parquet.transform:Received configuration: {'urls': ['https://techbeatconference.com'], 'depth': 5, 'downloads': 500, 'folder': 'workspace/crawled'}
WARNING:scrapy.core.spidermw:Middleware dpk_connector.core.middlewares.ConnectorDownloadedStats doesn't support asynchronous spider output, this is deprecated and will stop working in a future version of Scrapy. The middleware should be updated to support it. Please see https://docs.scrapy.org/en/latest/topics/coroutines.html#for-middleware-users for more information.
17:35:02 DEBUG - url: https://techbeatconference.com/TechBeat/techbeat/, filename: techbeatconference_com_TechBeat-techbeat_text.html, content_type: text/html; charset=UTF-8 at "/opt/anaconda3/envs/allycat-1/lib/python3.11/site-packages/dpk_web2parquet/transform.py:71"
DEBUG:dpk_web2parquet.transform:url: https://techbeatconference.com/TechBeat/techbeat/, filename: techbeatconference_com_TechBeat-techbeat_text.html, content_type: text/html; charset=UTF-8
ERROR:scrapy.core.scraper:Error processing ConnectorItem(dropped=True, downloaded=False, system_request=False, sitemap=False)
Traceback (most recent call last):
File "/opt/anaconda3/envs/allycat-1/lib/python3.11/site-packages/scrapy/core/scraper.py", line 388, in start_itemproc
output = await maybe_deferred_to_future(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/allycat-1/lib/python3.11/site-packages/twisted/internet/defer.py", line 1092, in _runCallbacks
current.result = callback( # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/allycat-1/lib/python3.11/site-packages/scrapy/utils/defer.py", line 407, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/allycat-1/lib/python3.11/site-packages/dpk_connector/core/pipelines.py", line 28, in process_item
raise DropItem

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions