This software is not authorized by Google and doesn't follow Google's robots.txt. Scraping without Google explicit written permission is a violation of thei terms and conditions on scraping and can potentially cause a lawsuit
Local Environment
- NodeJS (https://nodejs.org/de/)
NPM-Packages
- Puppeteer (https://www.npmjs.com/package/puppeteer)
- Minimist (https://www.npmjs.com/package/minimist)
- Download latest project release, extract and (if desired) move folder to your home directory
- Check if Node and NPM are already installed. Open Terminal and ...
- type
node -vin Terminal to check NodeJS version number (and if installed already) - type
npm -vin Terminal to check NPM-Manager version number (and if installed already) - if not, install Homebrew (from https://brew.sh/index_de; Mac) and then NodeJS with
brew update && brew install node
- In Terminal move to project folder (type
cd folder/if you named the project folder "folder") - Install required NPM packages, type
npm installin Terminal
Type npm run scraper -- --help for help (or read on).
Run script with arguments with one of the following commands
npm run scraper -- --clicks=[0-2/max] --kw=[...] --lang=[de/en] (--output=csv)node get_paas.js --clicks=[0-2/max] --kw=[...] --lang=[de/en] (--output=csv)
Arguments
- --clicks=[0-2/max] : how often click on new questions [0-2/max] (be patient when using
max, ~3min) - --kw=[...] : input of keyword (search term) or "keywords" for batch mode (read line by line keywords from
keywords.txt) - --lang=[de/en] : choose languange of google search [de/en]
- --output=csv : (optional) to export list of questions
Examples
npm run scraper-- --clicks=max --kw=firefox --output=csv --lang=en-- --clicks=0 --kw=angela+merkel --lang=de-- --clicks=0 --output=csv --kw=keywords --lang=en(batch mode)
node get_paas.js--clicks=max --kw=firefox --output=csv --lang=en--clicks=0 --kw=angela+merkel --lang=de--clicks=0 --output=csv --kw=keywords --lang=en(batch mode)
What happens here
- Browser goes to https://www.google.com/search?hl=de&gl=DE&ie=utf-8&oe=utf-8&no_sw_cr=1&pws=0&q=[KEYWORD] (default/de)
- If
clicksis set to0initially found questions are returend - If
clicksis set >0then sets of appearing questions (after clicks) are clickedNtimes (first set = 4 (initial) questions) - Extract all questions from serp after clicking is done
- Output to CLI and CSV file (if csv argument is given)
- If something breaks or errors occur during runtime, please ask Philipp at hello@jpigla.de.
Version 1.1 (15.10.2019)
- Add npm script
- Optimize performance
- Add
--helpargument - Add
--lang(language) argument [de/en] - Edit readme
Version 1.0 (07.10.2019)
- Initial upload
- Working version
All assets and code are under the GPL v3 License unless specified otherwise.
