Minimal Node crawler boilerplate with modern ES6 features built-in (i.e. Promises in requests, import/export syntax, etc.), cheerio and express
- Start building your own crawler within seconds
- Give you a minimalist skeleton and modern ES6 features that are not currently supported out of the box in Node
Just clone the repo, install the dependencies (yarn install), write your crawler and run yarn start, voilà!
yarn start- serves the app onlocalhostin watch modeyarn run build- builds the project, the out directory is/dist
Just a straightforward example to help you understand the usage of some of the tools in this project
import requestPromise from "request-promise-native";
import cheerio from "cheerio";
import app from "express";
const app = express();
app.get("/", async (req, res) => {
const $ = await requestPromise("https://path-to-website.com/", {
transform: body => cheerio.load(body),
});
const header = $("h1").text();
// ...do the rest of your crawling...
// send whatever you'd like to the browser
res.send(header);
});
app.listen(3000);- TypeScript is here just to get modern ES6 features in Node, like
import/export - cheerio - jQuery-like selectors for Node
- request-promise-native - use
Promises in Node requests - express - watch (and interact) whatever you expect in the browesr rather than CLI
- nodemon - runs the server in watch mode (i.e. will rebuild each time the code has changed)
- It would be nice to add a script to run tests
- If you use the fs - fs-extra - be able to use
Promises in filesystem methods instead of callbacks
MIT
Thanks for using this boilerplate! 🙏 @eliranlevi