CSE 5095: Research Topics in Bioinformatics - Spring 2012

Yufeng Wu
235 ITEB, ywu@engr.uconn.edu
Office hours:
ITEB 235, Wednesday 10-12 pm and 2-4:30 pm or by appointment.

This course covers selected research topics in bioinformatics and computational
biology. I will mainly focus on some latest development in bioinformatics algorithms:
string algorithms and their applicaiton in next generaiton sequencing data analysis,
coalescent theory and population genetics, and complex evolutionary models.
The goal is to present some state-of-the-art computational aspects of bioinformatics
and inspire students to pursue research in this fast-developing field.

This course is lecture-based. Students are required to read and present a research
subject in algorithmic bioinformatics. Each student should also perform some empirical
study by implementing some bioinformatics algorithms.

In particular, the planned subjects are:

1) String matching algorithms and applications in next generaiton sequencing
data analysis. New developments in Suffix Trees and Arrays. Burrows-Wheeler transform.
Genome assembly. Reads mapping. Structrual variations detection.

2) Coalescent theory. Basic models in coalescent theory. Probabilistic computation
on coalescent models. Applications of coalescent theory in genetics.

3) Recombination. Recombination models. Lower bounds. Ancestral recombination graphs
and related algorithms. Other recombination models.

4) Complex evolutionary models. Subtree prune and regraft and related algorithms.
Phylogenetic network models.

Prerequisites. As for background, essentially no biology is assumed. The most relevant
background is a graduate course on algorithms, but a serious student who has only had
a undergraduate algorithm course, or a smart, mathematically mature student who has
had neither, might also be able to follow the course.

Textbook: No textbook required. The following books are useful to this course.

1.  Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
by Dan Gusfield, 1997.
An excellent introduction for people from computer science background.
2. Coalescent Theory: An Introduction, by John Wakeley, 2008. This is a good introduction
to coalescent theory. I have requested reservation of this book in the library.

Homework.  I have not yet decided whether to assign written homework or not. The current
plan is that each student will write up lecture notes for the presented papers/topics in lectures.
Moreover, each student needs to work on one problem related to lectures (which is mostly
about reading some papers related to what is taught in class).

Presentaiton.  Each student needs to select a particular subject in algorithmic bioinformatics
to present to the peer students. The student should contact the instructor about the subject first.
I prefer the presentaiton to provide some general background and also cover some interesting
technical aspects.

Projects. Each student should do some empirical study. Often this means a student needs to
implement some bioinformatics algorithms and tests its performance. I prefer that each student
works on his/her own project but exception can be made. Alternatively, a student can also
choose to conduct some more theoretical investigation (e.g. designing a faster algorithm for
some bioinformtics algorithms).
Again, each project needs to be first approved by the instructor.

Exams. I do not plan to give exams in this course, although this may change according to the course progress.

Grading. The grade will be assigned by: lecture notes writing (20%), lecture problems (15%), project (40%),
and presentation (25%).