-
Notifications
You must be signed in to change notification settings - Fork 0
Auto‐update suffix array to newest UniProtKB version
In order to generate a new version of the Unipept suffix array (based on a new version of the UniProtKB database), you can either manually perform all steps that are described in the guides in the wikis of the unipept-database, unipept-index and unipept-api repositories, or you can start the update_uniprot.sh
script in this repository on one of the API servers to automatically go through the whole pipeline.
The update_uniprot.sh
script has two modes that can either be used to generate the suffix array and all associated files from scratch (update
mode), or to clone all files from another server that has generated a new suffix array before (clone
mode).
Depending on the selected mode, the script requires another set of parameters to be passed.
Important
Remember to start this script in a screen
session, since it can take quite some time before finishing!
Usage: ./update_uniprot.sh <mode> [OPTIONS]
In this mode, the whole suffix array construction pipeline will be run from start to finish.
The script will clone all required repositories, build the required files, and setup a MariaDB-database required by unipept-database
.
The only thing that still needs to be done manually, is to start the unipept-api
executable and actually allow end users to query the new index.
None
-
--scratch-dir
: Directory where temporary repositories and executables will be stored (default:~
). -
--output-dir
: Directory where the final output files will be stored (default:/mnt/data
). -
--help
: Show the help message and exits. -
--database-sources
: Comma-separated list of database sources (in UniProtKB) that should be downloaded and processed (default:swissprot,trembl
).
All default values provided by the script are already the ones we need.
./update_uniprot.sh update --scratch-dir "$HOME" --output-dir "/mnt/data" --database-sources "swissprot,trembl"
In this mode, the script assumes that the suffix array was already constructed on another machine and needs to be transferred to this one. This means that you don't need to wait for the whole suffix array to be constructed on a new machine, it can simply be cloned.
-
--local-ssh-key
: Path to the private key on this machine used to communicate with the remote server. -
--remote-address
: Address of the remote server from which the suffix array should be cloned.
-
--scratch-dir
: Directory where temporary repositories and executables will be stored (default:~
). -
--output-dir
: Directory where the final output files will be stored (default:/mnt/data
). -
--help
: Show the help message and exits. -
--remote-user
: Username of the remote server (default:unipept
). -
--remote-port
: Port of the remote server available for SCP to communicate over (default:4840
). -
--remote-output-dir
: Directory on the remote server that stores the suffix array and all related files (default:/mnt/data
)
./update_uniprot.sh clone --scratch-dir "$HOME" --output-dir "/mnt/ssd" --local-ssh-key "~/.ssh/id_github_tibvdm" --remote-address "rick.ugent.be" --remote-port "4840" --remote-output-dir "/mnt/data"