Monthly Archives: May 2012

What is Bioinformatics?

The term “bioinformatics” has be defined in many different ways since its first use more than 20 years ago. However, the interdisciplinary application of computers to biological data has always been part of the definition. Here is a small collection of (short) answers to the question “What is bioinformatics?”:

“Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.” – NIH Biomedical Information Science and Technology Initiative (2000)

“Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development.” – (2001?)

“Bioinformatics is the application of statistics and computer science to the field of molecular biology. It includes computational biology, algorithm development, statistics techniques, data modeling and visualization.” – Owen White (2010)

“Bioinformatics is a science where we integrate computer science, genetics and genomics.” – Atul Butte (2010)

“Bioinformatics is the application of computer science and information technology to the field of biology and medicine.” – (2012)

For a more detailed definition of the term bioinformatics, take a look at the list provided by the International Society for Computational Biology (ISCB) or the Bioinformatics FAQ at

Profiling / Benchmarking Perl code

There is an easy way to measure the performance of every part of your Perl code – it’s called NYTProf.

If you don’t have it yet, install the profiling modul Devel::YTProf

sudo perl -MCPAN -e 'install Devel::NYTProf'

Then run your Perl script with an additional call to the profiler:

perl -d:NYTProf input.file

The -d starts the debug mode which is a short hand for -MDevel:: (loads the module Devel::NYTProf before running your Perl script).

The profiler produces the file nytprof.out. Please note that the profiler will add some addtional processing time to your script.

The last step is to generate the HTML output that will show you all the results of the profiler (including the time spend on each line of code used while running the script).

nytprofhtml -o nytprof -f nytprof.out

The -o defines the output directory where all the HTML files will be written to and the -f defines the input file name (useful if you want to compare multiple runs, otherwise it can be ignored as it defaults to nytprof.out).

Now open the index.html in the output directory and start improving your code!

Sourceforge repository (CVS or SVN)

The Lab Sourceforge repository should be used for keep and sharing all code. Consider it as an offsite backup. If you ever create a piece of code and want to get it back again, at any point in the future, then this is the place to store it. It doesn’t matter how good or bad the code is, you should put it in the repository as soon as you start working on it, and update the synchronization often.

Here is how to access the code:

Start by creating a sourceforge account, and emailing Rob and asking him to add you to the account users.

Set up cvs access via ssh

export CVS_RSH=ssh

Now decide which modules you want to download, we have several that you can access, and you can browse them online before downloading them.

Download what you want. For example to download the bioinformatics section do this:

cvs -z3 co -P bioinformatics

(note: there should be no spaces after the :’s)

Edit the code at will, and then check in changes like this:

Make sure you are up-to-date. From within the root directory (bioinformatics in the above example)

cvs update -Ad

Commit the changes, writing to a changelog on the way

cvs commit -m ‘adding new methods’

Obviously replacing username with your username you created.