CSE 300 Algorithms in Bioinformatics - Fall 2007

Yufeng Wu

Please read carefully, although I will not cover much of these in class.

This course covers important algorithmic results in bioinformatics
and computational biology. I will cover both established topics
(on, e.g. exact string matching, sequence alignment, phylogeny) and
latest developments in, e.g. population genetics and systems biology.
The goal is to present an overall picture of algorithmic aspects
of bioinformatics and inspire students to pursue research in this
fast-developing field.

This course is lecture-based. Homeworks will be assigned on major
subjects covered in the class. Students are required to read and
present a research paper in algorithmic bioinformatics. There will
be a (possibly take-home) final exam. There is no required programming,
although some optional programming projects are possible.

In particular, the planned subjects are:

1) Exact string matching. New developments in Suffix Trees and Arrays.
We will review the basics of suffix trees and arrays, and
then look at some recent work that use suffix arrays and
their use in bioinformatics.

2) Sequence analysis. This includes space-efficient pairwise alignment,
multiple sequence alignment with provable properties, ideas behind the
popular bioinformatics tool BLAST and latest developments on improving

3) Phylongey. Various classical phylogenetic methods: ultrametric trees,
additive trees, and perfect phylogeny. On perfect phylogeny, we will
cover both classic binary perfect phylogeny and multi-state perfect
phylogeny. We will also cover the currently widely used phylogenetic
methods, including parsimony, Neighbor Joining Algorithm and maximum
likelihood (if time permits).

4) Genome rearrangement. We will sample some results on this interesting
subject, on which seminal results have been obtained.

5) Population genetics. This includes haplotype inference and reconstruction of
networks with recombination.  These topics are what I am currently working on.

6) Other subjects, including biological networks and related
algorithmic problems, gene regulation, and structural bioinformatics.

Prerequisites. As for background, essentially no biology is assumed.
The most relevant background is a graduate course on algorithms,
but a serious student who has only had a undergraduate algorithm course,
or a smart, mathematically mature student who has had neither, might also
be able to follow the course.

Textbook: No textbook required. The following books are useful to this course.

1.  Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
by Dan Gusfield, 1997.
An excellent introduction for people from computer science background.
2. Inferring Phylogenies by Joseph Felsenstein, 2003. A nice general treatment of phylogenetics.
3. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
by Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison, 1999.
This book is widely used as the textbook for teaching statistical aspects of bioinformatics.
We will use it from time to time.

Also, this book might be of interests as well.

1. Phylogenetics (Oxford Lecture Series in Mathematics and Its Applications, 24)
by Charles Semple and Mike Steel, 2003.
Contains interesting materials on phylogenetics. Very mathematical.

Homeworks. Homeworks should be typeset in LaTeX. Late homeworks will not
be accepted. Please acknowledge the source of any ideas. You may share
ideas with someone else as long as you acknowledge them.
If you work with one or more person on a writeup then
you should turn in a single writeup.

Projects. You must read one (or more) paper on a chosen topic in computational
biology and bioinformatics, understand it and then write a short (perhaps 2-4 pages)
document on it. Your goal is not to repeat what the author(s) said. Instead,
I would like to see some interesting or semi-interesting ideas (observations, extensions,
etc.). If you prefer, you can also do a small research project on your own about anything
you think interesting in computational biology. In either case, you have the freedom
to choose subject, but make sure to email me to get permission on papers/subjects.

Exams. There is no real sit-in exams for this course. There will be a take-home final
exam, which is more like a comprehensive homework problem set. There will be a 25
minutes discussion with you in my office. The subject is likely to be the project you did.
Do not worry, this is not an exam, just a chance for me to see what students learn
in the course and what interests students have.

Grading. This is a non-required graduate course. I expect you register it because
you are interested in it and want to learn something about bioinformatics.
I am required to assign a grade. The grade will be assigned by: 
homework, project report and discussion, and take-home final.