CS 6293 Final homework

Pick ONE paper from the reading list and write a short summary (0.5-1 page, one column single-spaced format, font size 10-12). The summary needs to address issues listed in the Advices section.

Besides the summary, you can also give a presentation on April 29th (7-10 minutes). Presentation is optional, but if you do, you will be graded by both the presentation and the summary you give.

It is okay to read the same paper in groups, but if you do, each one needs to write your own summary independently (!!!)


Possible papers to consider:

I suggest you to follow the references in one of the chapters we discussed in class (it does not necessarily need to be the one that you presented), but you may use some other references. Here are some papers that I picked, roughed grouped into six categories.


Genomic sequences and cancer

Text Mining



Network-based disease biology


Network and gene functions

To browse for papers from bioinformatics-related conferences

To browse for papers from bioinformatics-related journals

To search for papers

Advices on how to read and present a paper (Adapted from this web page)

When you present a paper in this course (or elsewhere), your goal is to get your audience to appreciate the contribution that the paper makes to scientific knowledge. Generally, you need to explain the following three things about the paper to do that. It often makes sense to present each point in order, but it is more important to focus on the essence of the contribution than it is to follow any particular format.

  1. What is the problem the paper is trying to address? You should both define the problem and explain its broader significance. In addressing this question, you want to consider things like: What is the biological nature of the problem? Is it reconstructing evolutionary history, identifying genes relevant to the prognosis or treatment of a disease? Why is that important? What is the contribution of the paper to furthering our understanding of the biology? Then you may want to talk about the computational nature of the problem. How was the biological problem reformulated into a computational problem? Is that the main contribution (it often is)? Are there aspects of the computational problem that are particularly interesting? Is a previous (or obvious) computational formulation too slow or not accurate enough? If so, what kind of improvement in the computational approach would be important, and why? Or is this a comparison of alternative approaches? If so, why were those approaches selected and not others? How are they to be compared?
  2. What were the methods used in the paper? Often, this is where you have to spend the most time in your presentation, since new methods are the essence of most bioinformatics publications. You want to carefully explain exactly what was done. It may require a very close reading of the paper to figure this out; often important facts are buried in seeming asides. When you are working on this part of your presentation, imagine you were trying to replicate the work. What would you need to know?
  3. What were the results reported? Ideally, it would be straightforward to compare the results presented with the problem statement, but it is not always that easy. Discuss the evaluation method(s) as well as the results. It is often interesting to consider how the authors chose to evaluate there contribution: was it fair? was it indicative of "real world" performance?

Try to identify where the main contribution of the paper is. For example, some papers define interesting new problems, but apply relatively straightforward methods to addressing them. For a paper like that, focus on work on related problems, and how the new problem statement differs from them. Are there better approaches developed for related problems that can be applied to the new problems? Some papers present a new approach to a well studied problem. For those papers, carefully compare the new method to other approaches people have taken to the problem. Also, in that situation, the choice of the evaluation method (used to compare the new approach to existing methods) is an important place to focus.

Look for unstated assumptions made in the paper, and try to make them explicit. For example, does a paper on finding cis-regulatory elements from sequence and gene expression data assume that the elements are independent of each other? That the position of the element with respect to the start of transcription is unimportant? Reading alternative approaches to the same problem will make it easier for you to identify these assumptions.

After you have communicated these facts about the paper, you can discuss the aspects you thought were most important or interesting. Is this a method that belongs in your "bioinformatics toolkit"? Can it be applied to related problems straightforwardly, or is it highly specialized? Was there something particularly impressive about the method, the evaluation, the translation of the problem into computational terms, etc.?

In general, bioinformatics papers have an "engineering" flavor that fits well into this problem / method / results paradigm. However, some papers have more of a "basic science" flavor, where a particular claim is being made, and evidence is presented to support that claim. Providing evidence for a claim is closely related to testing a particular hypothesis. If you feel that this better fits the paper you are presenting, then rather than using the problem / method / results paradigm, you can explain it in terms of claims and evidence.