BME 4800 and CSE 3800/5800:  Bioinformatics
Fall 2009

Instructor: Yufeng Wu

Lecture: Tuesday and Thursday 3:30-4:45 pm, BRON 124.

Office Hour: Tuesday and Thursday 2-3 pm, or by appointment.


Course Description
. See the Syllabus.

Latex. I will appreicate if you can typeset your homework solutions. Also, you are required to typeset the project report. Latex is a nice tool to learn. If you have no experience with Latex, you may want to start with a sample Latex file and the sample PDF output.


RNA structure: why structure is important? Dynamic programming algorithm for RNA structure prediction.

Inconsistency of maximum parsimony. Introduction to RNA.
Sect. 10.1 and 10.2 (p. 265-273)

Sect. 8.6. Sect. 10.1 (p.262-265)

Maximum likelihood (cont.). Compare phylogenetic methods: when is maximum parsimony justified?

Maximum likelihood inference of phylogeny.

Sect. 8.4 (p.206-207),  Sect. 8.6 (not fully covered)

Sect. 8.3.

Probabilistic models of evolution.

Compatibility and perfect phylogeny.

Sect. 8.1, 8.2.

Notes by Gusfield (PDF).
Parsimony: Fitch and Sankoff algorithms, branch and bound.

Neighour Joining: why it finds the right tree?

Sect. 7.4.

Sect. 7.3.  The proof I presented in class is based on this paper.
Project 2
Ultrametric trees and additive trees. Algorithms for inferrence when data is perfect.

Phylogeny: introduction and counting.

Chap. 7: p.166-170. If you have Gusfield's book, you may also read Sect. 17.1, 17.2 and 17.4.1.

Chap. 7: p. 161-165.

MSA with profile HMM. Star alignment approximation. A little of progressive alignments.

Discussion of project 1. MSA: branch and bound.
Chap. 6: p. 145-157.
See Gusfield's book, Sect. 14.6.2. if you have it. Otherwise, you can read the paper by Gusfield..

Chap. 6: p.143.
Profile HMM (cont.). MSA: scoring and dynamic programming.

Profile HMM

Chap. 5: sect. 5.5 and 5.7. Chap. 6:  p.135-143.

Chap. 5: sect. 5.1-5.3.

Pairwise alignment with HMM.

EM and Baum-Welch. More on HMM.
Chap. 4.

Sect. 3.4-3.5: p. 69-73.  Sect. 11.6.
New test data
Note: read the README-new file carefully.
10/8: HMM parameter estimation: Baum-Welch algorithm.

10/6: Algorithms for HMM: Viterbi, Forward/Backward. Numerical issues.

Sect. 3.3. Also p. 312-313.

Sect. 3.2 (p.56-62) and Sect. 3.6.

10/1: Markov models. Hidden Markov models: what is hidden?

9/29:  Significance of alignment scores

Sect. 3.1-3.2 (p. 47-55)

Sect. 2.7 (also Sect. 11.1)

9/24: Linear space sequence alignment. A little bit on Blast.

9/22:  MUM revisited. Repeated matches. More complex gap penalty models.

The space-saving algorithm explained by Gusfield (PDF). Also read Sect. 2.6.

p. 25-26, and Sect. 2.4.

Project 1
Test data
9/17: Local alignment: Smith-Waterman algorithms and expected score of random matches. Overlap matches.

9/15:  Pairwise sequence alignment.  Statistical  justification of the scoring model.

Sect. 2.3: p.22-p.25, p.27.

Sect. 21. - 2.3 (up to p. 22)

9/10: Search pattern efficiently in suffix array. Two applications of suffix tree. A little bit on MUMs.

9/8: Algorithm for building suffix tree. Application in text compression.
My notes on suffix tree and suffix array (PDF). It is updated with a section on pattern search in suffix array.
My notes on two applications we discussed today (PDF).

If you want, you can also read
the original paper on linear-time suffix array algorithm (PDF).
Updated  9/15
to fix an off-by-one error in Problem 3
9/3 Suffix tree and suffix array algorithms.

9/1: Introduction of bioinformatics,
Exact string matching: a simple linear time method
An introduction to suffix tree by Dan Gusfield (PDF).
A simple introduction to suffix array link.
Explanation of linear-time algorithm of LCP array construction by Dan Gusfield (PDF). Note it gives argument for the claim I made but did not prove.

Explanation of the Z-algorithm by Gusfield (PDF).