To run this project, you will need to add the following environment variables to
your .env file:
- App configs:
LOG_LEVEL: Log level.LOG_FILE_PATH: (Optional) File path to save logs. Default toscraper.log.
E.g:
# .env
LOG_LEVEL=info
You can also check out the file .env.example to see all required environment
variables.
-
This project uses pnpm as package manager:
npm install --global pnpm
Clone the project:
git clone https://github.com/v-bible/gallica-scraper.gitGo to the project directory:
cd gallica-scraperInstall dependencies:
pnpm installBuild the project:
pnpm buildUSAGE
gallica-scraper [--outDir value] [--toPdf] <args>...
gallica-scraper --help
gallica-scraper --version
Digital Gallica Library Scraper
FLAGS
[--outDir] Output directory. Default to "./output/<document-name>"
[--toPdf/--noToPdf] Convert downloaded images to a single PDF file
-h --help Print help information and exit
-v --version Print version information and exit
ARGUMENTS
args... List of document urls to scrape from Gallica (e.g., "https://gallica.bnf.fr/ark:/12148/bpt6k42278868", "https://gallica.bnf.fr/ark:/12148/bpt6k42472912")Example:
pnpm build && ./dist/cli.js --outDir ./my-output --toPdf https://gallica.bnf.fr/ark:/12148/bpt6k42472912Contributions are always welcome!
Please read the contribution guidelines.
Please read the Code of Conduct.
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
See the LICENSE.md file for full details.
Duong Vinh - @duckymomo20012 - tienvinh.duong4@gmail.com
Project Link: https://github.com/v-bible/gallica-scraper.
