Renamed to ascrape.
Nodejs module for extracting web page content using Cheerio.
Turn any web page into a clean view. This module is based on luin's readability project.
npm install readability-js
read(html [, options], callback)
Where
- html url or html code.
- options is an optional options object
- callback is the callback to run - callback(error, article, meta)
var read = require('readability-js');
read('http://howtonode.org/really-simple-file-uploads', function(err, article, meta) {
  // Main Article
  console.log(article.content.text());
  // Title
  console.log(article.title);
  // Article HTML Source Code
  console.log(article.content.html());
});
NB If the page has been marked with charset other than utf-8, it will be converted automatically. Charsets such as GBK, GB2312 is also supported.
readability-js will pass the options to request directly. See request lib to view all available options.
readability-js has 2 additional options:
- 
onlyArticleBody (Boolean) - get only article body or all main content; 
- 
preprocess - which should be a function to check or modify downloaded source before passing it to readability. 
read(url, {
  preprocess: function(source, response, contentType, callback) {
    if (source.length > maxBodySize) {
      return callback(new Error('too big'));
    }
    callback(null, source);
  }, function(err, article, response) {
    //...
  });
- 
content - The article content of the web page. Return false if failed. Is a Cheerio object. 
- 
title - The article title of the web page. It's may not same to the text in the <title>tag.
- 
excerpt - The article description from any description, og:description or twitter:description <meta>
