Since #8, we are ignoring the archive.org links in the scrape task, which is fine since these are historical and we are also tracking scrape/archive/pdfs as a backup.
Note: Not archive as in internet archive, but archive/pdfs as in copy of output/pdfs
However, that means that someone building the task/repo for the first time will not have the historical records in the processed output from the repo.
I think that's actually fine, Zac and team are more focused on DPA reports and not the older OCC ones anyways, but we should think about possible workarounds jic.