Skip to content
lukasmueller edited this page Sep 14, 2010 · 17 revisions

Introduction to Bioinformatics

The rise of the data-driven biology

For more than hundred years, biology was mostly a descriptive science. With the rise of molecular biology in the 1960s this fundamentally changed. With the recognition of the importance of biological sequences (protein and DNA), effort was directed towards establishing protocols to elucidate biological sequences. Interestingly, the more complex proteins (made of 20 different amino acids building blocks versus 4 bases in DNA) were sequenced first. Fred Sanger presented the sequence of insulin in 1955, for which he received the Nobel Prize in 1958. The sequencing of DNA had to await 1975, when Fred Sanger presented his Dideoxy method for DNA sequencing (and Maxam and Gilbert demonstrated another method, which is not in use anymore today). By 1977, Sanger had sequenced the genome of a phage (the first completely sequence genome, of course, and all done by hand!). Sanger, Gilbert and Berg shared another Nobel Prize in 1980 for this work.

Sequencing technology

Sanger sequencing

Sequencing technology evolved rapidly after Sanger’s initial description of the dideoxy chain termination method. Radiolabeling was replaced with fluorescent dyes to make the protocol more user friendly. Gels were replaced with capillaries, and machines were constructed that contained up to 384 capillaries for fast parallel sequencing.

The ABI 3730xl, pictured below, was one of the most advanced machines of that generation and the workhorse in many sequencing centers, such as the Sanger Center near Cambridge, England, and the Kazusa Center in Japan. This instrument could deliver up to 2 million base-pairs per day.

“Next-generation” sequencing

However, the chain termination method can only go so far in terms of efficiency. Therefore, alternatives were developed based on other chemistries. An important development was the pyrosequencing method. In this method, when a nucleotide is incorporated, a brief flash of light is emitted. This can be captured using a camera. If sequencing is done with immobilized DNA on a large chip, extremely highly parallel sequencing can be performed. This is the basis for 454 and Solexa sequencing.

454 sequencing

Solexa sequencing

SOLiD sequencing

Pacific Biosystems

Advances in computer technology

Typical problems in computational biology and bioinformatics

Computational Biology and Bioinformatics

Why Perl?

Text handling

BioPerl

CPAN

Recommended books

Learning Perl. O’Reilly.

Developing bioinformatics computer skills. Gibas & Jambeck, O’Reilly.

Mastering Perl for bioinformatics. James Tisdall, O’Reilly.

Clone this wiki locally