Skip to content

justmarkup/html-posts-to-markdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html posts to markdown

A node.js tool to extract html posts from webpages using puppeteer , extract them to markdown and save them.

Disclaimer

I haven't really tested this and there are many things missing, but it works for my use case.

Setup

  1. Clone the repository
  2. Run npm install

Usage

node index.js --url="https://justmarkup.com" --postSelector=".main .article h2 a" --titleSelector=".article h1" --contentSelector=".article .entry-content" --dir="/posts/"

Options

Option Default Description
--url https://justmarkup.com The entry page containing links to the posts
--postSelector .main .article h2 a The selector for all the links to your posts
--titleSelector .article h1 The selector for the title of your post
--contentSelector .article .entry-content The selector for the content wrapper of your post
--dir /posts/ The directory where the posts should be saved

About

Save your online html posts as markdown using puppeteer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published