easy hit
CSE 300: Algorithms in Bioinformatics
Fall 2007

Instructor: Yufeng Wu

Office Hour: Monday and Wednesday 10-11pm, or by appointment.


Course Description. See the Syllabus.

Latex. Please typeset your homework solutions and project report. As a graduate student, it is important for you to learn Latex since writing will be an important part of your work. If you have no experience with Latex, you may want to start with a sample Latex file and the sample PDF output.


Computational problems related to protein-protein interaction networks.

A little protein folding.
Section 1 and 2 of the RECOMB'05 paper (Link).
Path matching section of the JCB'07 paper (JCB, need subscription).

Approximation folding algorithm on HP model by Hart and Istrail (Link).

Introduction to biological networks. Network motifs. Boolean network for modeling regulatory network.

Reconstructing regulatory network using perturbation experiments. Network annotation problem.
Read the PNAS paper (Link) if you want to understand more on the network motifs I briefly described in class.

Read the PSB'04 paper on reconstructing chain function (Link).
We also briefly covered the basic idea in this ISMB'07 paper (Link).

Sorting by reversal.

Continue SBR. Breakpoint analysis.
The paper by Anne Bergeron on some simplification of the SBR algorithm (Link).

The paper on breakpoint analysis (Link). Read pages 7-9.
Root-unknown case of PPH. A short topic on population structure. Introducing recombination.

Recombination lower bound computation: HK bound and the haplotype bound. Start of genome rearrangement.
See this paper for the population structure problem (PDF). Focus your attention to Section 2 and 2.1. There are many sources of information on recombination, such as one in Wikipedia (Link).

Read the sections 1 and 2 of the Song, et al. paper (Link) for ideas on haplotype bound. Also see the slides for the paper (Link).
Start your project now.
Population Genetics. Hardy-Weinberg equilibrium. Introduction to coalescent theory.

Haplotyping as perfect phylogeny (PPH).
Refer to UCONN EEB 348's class notes (Link) for HW equilibrium.  Also refer to EEB 348's class notes (Link) for a tour of coalescent theory.

Gusfield's paper on PPH (Link). For the reduction to graph realization, see Gusfield's notes (Link).

Multi-state perfect phylogeny (continued). Parsimony: Fitch algorithm, Sankoff algorithm and heuristic tree search.

Statistical property of parsimony: justification of parsimony and inconsistency.
See page 473 of Gusfield' book for Fitch algorithm. See the first page of this notes  (Link) for Sankoff algorithm.

I highly recommend chapter 9 of Felsenstein' book, "Inferring phylogenies". If you do not have access to it, here is a class notes (Link) posted at University of Texas which gives a concrete example on inconsistency of parsimony.

Splits-equivalence theorem. Phylogenetic applications to human evolution.

Multi-state perfect phylogeny.
Gusfield's writeup on splits-equivalence theorem (PDF).

My notes on multi-state perfect phylogeny (PDF).
Another reference is the survey on perfect phylogeny by Fernandez-Baca (Link).
Two original papers on this algorithm are listed in references section below.
Homework 3 (PDF) is out. Due 10/31 in class.
A short description of UPGMA. Neighbor Joining algorithms and proof of consistency.

Introduction to parsimony. Binary perfect phylogeny.
David Bryant's paper (Link) on consistency of NJ. Read up to page 7.

Read Gusfield's book from p. 458 to p. 463.

Algorithms for constructing ultrametric trees. Introduction to additive trees.

Reduction of the additive tree problem to ultrametric tree problem. Fitting the branch length of a fixed topology tree using least square method.
Gusfield's notes on a simple algorithm for ultrametric trees (PDF).

Read Gusfield's book, p.466-468. If you do not have the book, the library has it on reserve. I could not find a good reference (except Felsenstein's book) on the least square methods, but the lecture slides (first several of them) by Felsenstein might be useful (Link).

Multiple sequence alignment. A short review.

Phylogeny. Counting of bifurcated and multifurcated trees. Ultrametric trees.
The RECOMB 2005 paper (Link) on the MSA on a tree with a polynomial-time solvable formulation.

See here (Link) for an explanation of the counting of multifurcated trees.
Read Gusfield's book if you have it, Ch. 17 (from p. 447-456). Or see the preprint by Gusfield on ultrametric and additive trees (PDF). Note: if you have the book, you do not have to print the notes since it is essentially the same as Ch. 17.1 and 17.2 of the book.
Homework 2 (PDF) is out. Due 10/3 in class.
Blast, PatternHunter and seeding.

Multiple sequence alignment: Sum of pair scoring, Dynamic programming and branch and bound, 2-approximation algorithm.
My explanation of the algorithm related to seeding described in class (PDF).
The original PatternHunter paper (Link).
The algorithm for computing the probability of seed hitting a region can be found on pages 9-10 from Keich, et al. (Link). This algorithm is slightly different from what we covered in class, but very similar.

Read Gusfield's book (1997) section 14.6 (pages 343-350). Gusfield's book should be on reserve in library.
The class notes by Ron Shamir (Link) contains pretty much what we discussed today. Read up to page 7.

Pairwise sequence alignment. The space saving dynamic programming algorithm. A (very) short introduction to approximate pattern matching.

Four Russians Algorithms for the edit distance.
An introduction to edit distance by Gusfield (PDF).
The space-saving algorithm explained by Gusfield (PDF).
Four Russians Algorithm explained by Gusfield (PDF).

Application of suffix trees in bioinformatics. These include whole genome alignment and tandem repeat detection. Tandem repeat algorithm writeup by Gusfield (PDF).
The MUMmer paper (PDF).
Homework 1 (PDF) is out. Due: 9/17 in class.
Exact string matching.
Suffix tree and suffix array algorithms.
Topics: concepts of suffix tree and suffix array, conversion between them, linear-time algorithm to build longest common prefix array. The elegant linear-time algorithm to build suffix array directly.
Another simple string matching algorithm: the Z-algorithm.
An introduction to suffix tree by Dan Gusfield (PS).
A book chapter on suffix tree and suffix array by S. Aluru (Link).
The original paper on linear-time suffix array algorithm (PDF).
My own short description of the LCP algorithm (PDF).
Explanation of linear-time algorithm of LCP array construction by Dan Gusfield (PDF).
Explanation of the Z-algorithm by Gusfield (PS).

Other useful readings

Here is more readings you may find useful.