How to implement Scrapy Splash in Virtual Machine

How do I run scrapy splash on a virtual machine with linux? Essentially, I have a lua script that requires me to send keys onto a site to log in and then scrape it.

I have installed docker however I cannot seem to get the scraper to work as it won't connect to the server.

Are there any simple steps that I can follow to get this to work on a VM? Like what should I install, and what should I do next before running `scrapy crawl spider`.

As for docker, I have implemented the following whilst in admin mode:
```
docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600
```
However this is currently running and I'd like it to run in on the background. I cannot seem to figure this out; I have tried:

```
docker run -d 8050:8050 scrapinghub/splash --max-timeout 3600
```

But I just get the error:
```
Unable to find image '8050:8050' locally
```

I believe this may solve my issue or perhaps not and I need some further installations. Please let me know! I really need expert guidance to figure this out.

I have opened another instance whilst docker was running on the first instance.

I get the following error when running the scrapy crawler:
```
2022-02-16 02:55:26 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': 
{'type': 'JS_ERROR', 'js_error_type': 'TypeError', 'js_error_message': 'null is not an object (evaluating \'document.querySelector("button:nth-child(2)").getClientRects\')', 'js_error':
 'TypeError: null is not an object (evaluating \'document.querySelector("button:nth-child(2)").getClientRects\')', 'message': '[string "..."]:12: error during JS function call: \'TypeEr
ror: null is not an object (evaluating \\\'document.querySelector("button:nth-child(2)").getClientRects\\\')\'', 'source': '[string "..."]', 'line_number': 12, 'error': 'error during JS
 function call: \'TypeError: null is not an object (evaluating \\\'document.querySelector("button:nth-child(2)").getClientRects\\\')\''}}
2022-02-16 02:55:26 [scrapy.core.engine] DEBUG: Crawled (400) <GET http://instagram.com/ via http://localhost:8050/execute> (referer: None)
2022-02-16 02:55:26 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 http://instagram.com/>: HTTP status code is not handled or not allowed
```

The scraper works perfectly fine on my mac so there's definitely an installation that I am missing somewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to implement Scrapy Splash in Virtual Machine #301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to implement Scrapy Splash in Virtual Machine #301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions