BME 4800 and CSE 3800/5800:  Bioinformatics
Fall 2009


Instructor: Yufeng Wu

Lecture: Tuesday and Thursday 3:30-4:45 pm, BRON 124.

Office Hour: Tuesday and Thursday 2-3 pm, or by appointment.

Anouncements.


Course Description
. See the Syllabus.


Latex. I will appreicate if you can typeset your homework solutions. Also, you are required to typeset the project report. Latex is a nice tool to learn. If you have no experience with Latex, you may want to start with a sample Latex file and the sample PDF output.


Schedule.

Week
Topics
References
Assignments
12
Probabilistic models of evolution.

Compatibility and perfect phylogeny.

Sect. 8.1, 8.2.

Notes by Gusfield (PDF).
HW4
11
Parsimony: Fitch and Sankoff algorithms, branch and bound.

Neighour Joining: why it finds the right tree?

Sect. 7.4.


Sect. 7.3.  The proof I presented in class is based on this paper.
Project 2
10
Ultrametric trees and additive trees. Algorithms for inferrence when data is perfect.

Phylogeny: introduction and counting.

Chap. 7: p.166-170. If you have Gusfield's book, you may also read Sect. 17.1, 17.2 and 17.4.1.


Chap. 7: p. 161-165.

9
MSA with profile HMM. Star alignment approximation. A little of progressive alignments.

Discussion of project 1. MSA: branch and bound.
Chap. 6: p. 145-157.
See Gusfield's book, Sect. 14.6.2. if you have it. Otherwise, you can read the paper by Gusfield..


Chap. 6: p.143.
HW3
8
Profile HMM (cont.). MSA: scoring and dynamic programming.

Profile HMM

Chap. 5: sect. 5.5 and 5.7. Chap. 6:  p.135-143.


Chap. 5: sect. 5.1-5.3.

7
Pairwise alignment with HMM.

EM and Baum-Welch. More on HMM.
Chap. 4.

Sect. 3.4-3.5: p. 69-73.  Sect. 11.6.
New test data
Note: read the README-new file carefully.
6
10/8: HMM parameter estimation: Baum-Welch algorithm.

10/6: Algorithms for HMM: Viterbi, Forward/Backward. Numerical issues.

Sect. 3.3. Also p. 312-313.


Sect. 3.2 (p.56-62) and Sect. 3.6.

5
10/1: Markov models. Hidden Markov models: what is hidden?

9/29:  Significance of alignment scores

Sect. 3.1-3.2 (p. 47-55)


Sect. 2.7 (also Sect. 11.1)

HW2
4
9/24: Linear space sequence alignment. A little bit on Blast.

9/22:  MUM revisited. Repeated matches. More complex gap penalty models.

The space-saving algorithm explained by Gusfield (PDF). Also read Sect. 2.6.


p. 25-26, and Sect. 2.4.

Project 1
Test data
3
9/17: Local alignment: Smith-Waterman algorithms and expected score of random matches. Overlap matches.

9/15:  Pairwise sequence alignment.  Statistical  justification of the scoring model.

Sect. 2.3: p.22-p.25, p.27.



Sect. 21. - 2.3 (up to p. 22)


2
9/10: Search pattern efficiently in suffix array. Two applications of suffix tree. A little bit on MUMs.

9/8: Algorithm for building suffix tree. Application in text compression.
My notes on suffix tree and suffix array (PDF). It is updated with a section on pattern search in suffix array.
My notes on two applications we discussed today (PDF).

If you want, you can also read
the original paper on linear-time suffix array algorithm (PDF).
HW1
Updated  9/15
to fix an off-by-one error in Problem 3
1
9/3 Suffix tree and suffix array algorithms.

9/1: Introduction of bioinformatics,
Exact string matching: a simple linear time method
An introduction to suffix tree by Dan Gusfield (PDF).
A simple introduction to suffix array link.
Explanation of linear-time algorithm of LCP array construction by Dan Gusfield (PDF). Note it gives argument for the claim I made but did not prove.

Explanation of the Z-algorithm by Gusfield (PDF).