Hellomouse Apps Site Queue

A fun Microservice for scraping stuff from sites for Hellomouse Apps

Features:

Download webpages as HTML (with assets like CSS, videos, images, etc... embedded as base64), PDF, WEBP (screenshot)
Special handling for certain websites, currently we have:
- Twitter / X: Tweets are downloaded as HTML + attached media (images, videos)
- Reddit: Posts and comments are downloaded with any attached assets
- Soundcloud: Songs are downloaded with metadata (HTML + audio)
- Newgrounds: Songs are downloaded with metadata (HTML + audio)
- Imgur: Albums and gallerys are downloaded with all images and metadata (HTML + images / videos)
- Youtube: Videos are downloaded
- Pixiv: Albums are downloaded
- Bilibili: Videos are downloaded

Built With

Setup

Install dependencies

npm install

Setup the config. You will need a PostgresSQL database running as well as the hellomouse-apps-api server (run the server first to generate the required tables).

There is an example config in the root directory. Copy it and rename it to config.js. Here are the properties:

export const dbUser = 'hellomouse_board';  // PostgresSQL user
export const dbIp = '127.0.0.1';           // Postgres Server location
export const dbPort = 5433;                // Postgres Server port 
export const dbPassword = 'my password';   // Postgres Server password
export const dbName = 'hellomouse_board';  // Postgres Server DB name

export const fileDir = './saves';          // Path to store all files, in general, web files are stored under this path/site_downloads/file.ext

To setup yt-dlp (optional) you can place your browser cookies in secret/yt-cookies.txt for use in downloading youtube videos, and secret/bilibili-cookies.txt for downloading bilibili videos.

To setup pixiv cookies (optional, for bypassing rate limiting and age restrictions) you can place your browser cookies (exported as a JS array of objects like [{ name: ... }])) and put the result in secret/pixiv-cookies.txt.

Run the server:

node index.js

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
config.js.example		config.js.example
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hellomouse Apps Site Queue

Features:

Built With

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hellomouse Apps Site Queue

Features:

Built With

Setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages