This is a Pythpn app designed for finding out whether files on a drive have already been ingested. It consists of two files:
Which, with the env variables: CHECKSUM_DB_NAME, CHECKSUM_TABLE_NAME and CSV_FILE_WITH_CHECKSUMS:
- Takes a CSV with the headings:
- FILEREF
- FIXITYVALUE
- ALGORITHMNAME
- Creates a SQLite Table
- Converts each CSV row into a SQLite row
- Creates an index with the fixity value
This script is only necessary if you only have the CSV version of the DB, otherwise, skip to the holding_verification.py with the DB or generate a new DB with the headings mentioned in step 1
Which, with the env variables: CHECKSUM_DB_NAME, CHECKSUM_TABLE_NAME and CSV_FILE_WITH_CHECKSUMS:
- Allows you to select 1 or more files or a folder, via GUI or CLI
- Opens each file and generates a checksum hash (fixity value) on the content
- Looks for that checkum hash in the DB
- If not found, it will generate a checksum hash using another algorithm, if not found, it will generate a checksum hash using another algorithm
- At most, it will generate 3 hashes: SHA256, SHA1 and MD5 and then give up
- If a file was found, the next file's checksum hash will be generated using the checksum hash algorithm of the file that preceded it
- If found, it will return the file reference(s) associated with the checksum, fixity value, algorithm name from the DB
- If not found, it will generate a checksum hash using another algorithm, if not found, it will generate a checksum hash using another algorithm
- It will write the information obtained from the DB as well as the path, file size and
True
orFalse
value for whether the checksum was found
- Just because a checksum was matched doesn't necessarily mean the file that is ingested had the same name
- Files that encountered errors are printed at the end but will look normal in the CSV