CS 5263 Final project page

Projects should be done in small groups (of 2 - 3 members). Groups combining people from different backgrounds are particularly encouraged. Feel free to use the class email list (5263@cs.utsa.edu) to brainstorm project ideas or to find partners. You have two choices for the project:

I'd suggest that each group send me a paragraph or drop by to tell me who's in the group, describe your topic, the initial papers, and the test data (if applicable). Maybe I can give you some pointers.

Before the beginning of the final week, each group will need to hand in a paper (approximately 5-10 pages) describing the project. Your paper must also clearly describe the contribution of each personnel in the group.

Each group will also need to give a 10 minute presentation, scheduled on the final exam day: 12/17. You can choose to turn in your project and do the presenation before the final week if you wish.

Timeine (all submissions via blackboard):

Possible topics:

You are encouraged to choose your own topics and impress me with your creative project ideas. Here are a few of mine to get you started.

  1. ChIP-seq data analysis: perform a literature review of algorithms on ChIP-seq peak calling tools and evaluate their performance on real TF-binding data.
  2. ChIP-seq and ChIP-chip data: perform motif finding on ChIP-chip and ChIP-seq data for common TFs and compare the results.
  3. Next-generation sequencing data analysis: perform a literature review of algorithms on NGS read alignment and evaluate their performance on real/simulated data.
  4. Motif finding: design an efficient algorithm for the (15,4)-motif challenge problem or for ChIP-seq based motif finding.
  5. Clustering: compare several gene expression clustering methods on several large-scale data sets and report the results.
  6. Classification: perform disease classification by combining multiple types of data.
  7. RNA secondary structure: perform a literature review of algorithms on RNA seconary structure prediction.
  8. Network construction and comparison:
  9. To be continued...

To search for papers

To browse for papers from bioinformatics-related conferences

To browse for papers from bioinformatics-related journals

Advices on how to read and present a paper (Adapted from this web page)

When you present a paper in this course (or elsewhere), your goal is to get your audience to appreciate the contribution that the paper makes to scientific knowledge. Generally, you need to explain the following three things about the paper to do that. It often makes sense to present each point in order, but it is more important to focus on the essence of the contribution than it is to follow any particular format.

  1. What is the problem the paper is trying to address? You should both define the problem and explain its broader significance. In addressing this question, you want to consider things like: What is the biological nature of the problem? Is it reconstructing evolutionary history, identifying genes relevant to the prognosis or treatment of a disease? Why is that important? What is the contribution of the paper to furthering our understanding of the biology? Then you may want to talk about the computational nature of the problem. How was the biological problem reformulated into a computational problem? Is that the main contribution (it often is)? Are there aspects of the computational problem that are particularly interesting? Is a previous (or obvious) computational formulation too slow or not accurate enough? If so, what kind of improvement in the computational approach would be important, and why? Or is this a comparison of alternative approaches? If so, why were those approaches selected and not others? How are they to be compared?
  2. What were the methods used in the paper? Often, this is where you have to spend the most time in your presentation, since new methods are the essence of most bioinformatics publications. You want to carefully explain exactly what was done. It may require a very close reading of the paper to figure this out; often important facts are buried in seeming asides. When you are working on this part of your presentation, imagine you were trying to replicate the work. What would you need to know?
  3. What were the results reported? Ideally, it would be straightforward to compare the results presented with the problem statement, but it is not always that easy. Discuss the evaluation method(s) as well as the results. It is often interesting to consider how the authors chose to evaluate there contribution: was it fair? was it indicative of "real world" performance?

Try to identify where the main contribution of the paper is. For example, some papers define interesting new problems, but apply relatively straightforward methods to addressing them. For a paper like that, focus on work on related problems, and how the new problem statement differs from them. Are there better approaches developed for related problems that can be applied to the new problems? Some papers present a new approach to a well studied problem. For those papers, carefully compare the new method to other approaches people have taken to the problem. Also, in that situation, the choice of the evaluation method (used to compare the new approach to existing methods) is an important place to focus.

Look for unstated assumptions made in the paper, and try to make them explicit. For example, does a paper on finding cis-regulatory elements from sequence and gene expression data assume that the elements are independent of each other? That the position of the element with respect to the start of transcription is unimportant? Reading alternative approaches to the same problem will make it easier for you to identify these assumptions.

After you have communicated these facts about the paper, you can discuss the aspects you thought were most important or interesting. Is this a method that belongs in your "bioinformatics toolkit"? Can it be applied to related problems straightforwardly, or is it highly specialized? Was there something particularly impressive about the method, the evaluation, the translation of the problem into computational terms, etc.?

In general, bioinformatics papers have an "engineering" flavor that fits well into this problem / method / results paradigm. However, some papers have more of a "basic science" flavor, where a particular claim is being made, and evidence is presented to support that claim. Providing evidence for a claim is closely related to testing a particular hypothesis. If you feel that this better fits the paper you are presenting, then rather than using the problem / method / results paradigm, you can explain it in terms of claims and evidence.