Skip to content

Updated BioSQL wiki page to reflect the latest version #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
76 changes: 45 additions & 31 deletions wiki/BioSQL.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,38 +46,53 @@ Installing Required Software
You will need to install some database software plus the associated
python library so that Biopython can "talk" to the database. In this
example we'll talk about the most common choice, MySQL. How you do this
will also depend on your operating system, for example on a Debian or
Ubuntu Linux machine try this:
will also depend on your operating system.
For example on a **Debian or Ubuntu Linux** machine try this:

``` bash
sudo apt-get install mysql-common mysql-server python-mysqldb
```

It will also be important to have perl (to run some of the setup
scripts). Again, on a Debian or Ubuntu Linux machine try this:
scripts). Again, on a **Debian or Ubuntu Linux** machine try this:

``` bash
sudo apt-get install perl
```

You may find perl is already installed.

For Windows users, see [BioSQL on Windows](BioSQL_Windows "wikilink").
For **Windows** users, see [BioSQL on Windows](BioSQL_Windows "wikilink").

For **Cygwin** users, use [apt-cyg](https://github.com/transcode-open/apt-cyg) to install packages **mysql** and **mysql-server**,
``` bash
apt-cyg install mysql mysql-server
```
and to install the driver **mysql-connector** use [pip](https://pypi.org/project/pip/),

``` bash
pip install mysql-connector
```

Downloading the BioSQL Schema & Scripts
---------------------------------------

Once the software is installed, your next task is to setup a database
and import the BioSQL schema (i.e. setup the relevant tables within the
database). See [BioSQL downloads](http://www.biosql.org/wiki/Downloads)
Once the software is installed, your next task is to setup a database,
import the BioSQL schema (i.e. setup the relevant tables within the
database) and finally populate the database.

In order to do so, files from the **biosql** project need to be obtained:

* Either from [BioSQL downloads](http://www.biosql.org/wiki/Downloads)
-- you'll need to unzip the archive.

Alternatively to get the very latest BioSQL, check out their git
repository. Or, navigate to the relevant schema file for your database
and download just that, e.g.
[biosqldb-mysql.sql](https://raw.github.com/biosql/biosql/master/sql/biosqldb-mysql.sql)
for MySQL. You will also want the NCBI Taxonomy loading perl script,
[load\_ncbi\_taxonomy.pl](https://raw.github.com/biosql/biosql/master/scripts/load_ncbi_taxonomy.pl).
* Or to get the **very latest** files, check out (or export) the relevant git
repository at (https://github.com/biosql/biosql.git)

``` bash
svn export https://github.com/biosql/biosql.git/trunk biosql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, avoid recommending the legacy tool svn here - we want any potential contributors to use git so:

git clone https://github.com/biosql/biosql.git
cd biosql

Or, for a simple snapshot:

wget https://github.com/biosql/biosql/archive/master.tar.gz
tar -zxvf master.tar.gz
cd biosql-master/

```
The names of the two files that are needed are the following:
1. biosqldb-mysql.sql -- the BioSQL schema -- found inside the **sql** subdirectory
2. load_ncbi_taxonomy.pl -- the Perl script to populate the database -- found inside the **scripts** subdirectory

Creating the empty database
---------------------------
Expand All @@ -92,8 +107,7 @@ mysqladmin -u root create bioseqdb
```

We can then tell MySQL to load the BioSQL scheme we downloaded above.
Change to the scripts subdirectory from the unzipped BioSQL download,
then:
Change to the **sql** subdirectory (see above) and then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the double back-ticks for directory names, sql, rather than the double asterisk for bold.


``` bash
mysql -u root bioseqdb < biosqldb-mysql.sql
Expand Down Expand Up @@ -176,11 +190,11 @@ psql biosqldb < biosqldb-pg.sql

Run *psql* and type enter *\\d <ENTER>* to see all the entities created.

NCBI Taxonomy
-------------
Populate the database With NCBI Taxonomy
----------------------------------------

The BioSQL package includes a perl script under
scripts/load\_ncbi\_taxonomy.pl to download and update the taxonomy
The BioSQL package includes a perl script under the
**scripts** subdirectory named **load\_ncbi\_taxonomy.pl** that downloads and updates the taxonomy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the double back-ticks for file names, load\_ncbi\_taxonomy.pl, rather than double asterisk for bold.

tables. The script should be able to download the files it needs from
the [NCBI taxonomy FTP site](ftp://ftp.ncbi.nih.gov/pub/taxonomy/)
automatically.
Expand All @@ -191,8 +205,7 @@ trying to load sequences into the database. This isn't so important with
Biopython 1.49 onwards, where you can instead opt to have the
information needed downloaded as needed from Entrez.

To update the NCBI taxonomy, change to the scripts subdirectory from the
unzipped BioSQL download, then:
To update the NCBI taxonomy, change to the **scripts** subdirectory (see above) and then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scripts


``` bash
./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql --dbuser root --download true
Expand Down Expand Up @@ -248,19 +261,20 @@ Running the unit tests
----------------------

Because there are so many ways you could have setup your BioSQL
database, you have to tell the unit test a few bits of information by
editing the file Tests/setup\_BioSQL.py and filling in the following
database, you have to tell the unit test a few bits of information.
If you installed biopython using pip then the relevant [**Tests**](https://github.com/biopython/biopython/tree/master/Tests)
folder would not have been copied. If this is the case one can
perform check out (or export) using:

```bash
svn export https://github.com/biopython/biopython/trunk/Tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't recommend using svn here. If you want to avoid needing git, we can just download a tar-ball or zip file from GitHub using e.g.

wget https://github.com/biopython/biopython/archive/biopython-172.zip
unzip biopython-172.zip

Or,

wget https://github.com/biopython/biopython/archive/biopython-172.tar.gz
tar -zxvf biopython-172.tar.gz

Also this lets us recommend getting the tests to match the version of Biopython installed, which should prevent test failures from a version mis-match (although that would be rare with BioSQL as this code is fairly static these days).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please avoid svn export and use git or a plain download from GitHub.

```
Inside *Tests*, copy the file *biosql.ini.sample* to *biosql.ini* and edit it by filling in the following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use double-backtickes, Tests, biosql.ini.sample and biosql.ini - not single asterisk for italics.

fields:

``` python
DBDRIVER = 'MySQLdb'
DBTYPE = 'mysql'
```

and a little lower down,

``` python
DBHOST = 'localhost'
DBUSER = 'root'
DBPASSWD = ''
TESTDB = 'biosql_test'
Expand Down