The PostgreSQL database will (currently) need around 150 GB total.
brew install libpqcurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shInstall the Diesel CLI with:
cargo install diesel_cli --no-default-features --features postgres- Download and install PostgreSQL as appropriate for your system.
- Configure the port to 5432 (default)
Next, we need to initialze the database. Run:
pushd postgres_db
diesel setup
popdTo scrape GHSA, you need a Github PAT with the read:packages scope. You can create one here.
Then, you need to set your API token in .secret.env:
echo "export GITHUB_TOKEN=<TYPE API TOKEN HERE>" >> .secret.envYou may choose to either enable and setup InfluxDB for logging, or disable it.
If you want to enable InfluxDB, you need to set ENABLE_INFLUX_DB_LOGGING=true and configure INFLUX_DB_URL, INFLUX_DB_ORG, INFLUX_DB_BUCKET in .env.
Then you need to set your API token in .secret.env:
echo "export INFLUX_DB_TOKEN=<TYPE API TOKEN HERE>" >> .secret.envIf you don't want to use InfluxDB logging, just disable it in .env by setting ENABLE_INFLUX_DB_LOGGING=false.
First, make sure telegraf has been started at least once:
systemctl start telegrafThen, edit it:
systemctl edit telegrafwith the following contents:
[Service]
EnvironmentFile=<PATH TO YOUR REPO CLONE>/.env
EnvironmentFile=<PATH TO YOUR REPO CLONE>/.secret.env
Finally, restart Telegraf:
sudo systemctl daemon-reload
pushd services; ./restart_telegraf.sh; popdThe NPM Changes Follower script continually fetches changes from NPM, and insertes them into the change_log Postgres table.
It will fetch changes starting after the most recently fetched changes, so in case of crashing / server reboots, etc. it can be restarted without worry.
TODO: currently the NPM Changes Follower will quit after a long enough delay of not receiving changes. So you really should run it in a loop, but I haven't automated that yet.
To run the NPM Changes Follower, from this directory run:
cargo run --release --bin changes_fetcherThis will both build and run the NPM Changes Follower, in release mode. Once the follower catches up to present-day (maybe 12-48 hours), there should be somewhere near ~3 million rows (~100 GB).
Unlike the NPM Changes Follower, the Download Queuer does not run continually. Instead, upon each execution,
the Download Queuer will scan the change_log table (populated by the NPM Changes Follower),
and insert into the download_tasks table any tarball URLs that haven't already been added.
Running this for the first time with a fully-populated change_log table will take around 12 hours.
After that, how long it takes to run depends on how often you run it, but should be pretty fast.
Running say every 10 minutes should be fine.
TODO: Running the Download Querer on a schedule is not automated yet!
To run the Download Queuer, from this directory run:
cargo run --release --bin download_queuerAfter all present-day tarballs have been inserted into the download_tasks table, there should be around ~25 million rows (~28 GB).
The repo for dependencies.science is separate, please see this repo