Week 
Topics 
References 
14 
Student presentation 5/2/: Approximate string matching (presented by L.Nazaryan ) 4/30: Extension to BWT (presented by R. Jiang), and tandem repeat finding (presented by S. Mirzaei) 
Extension to BWT: An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression (http://link.springer.com/chapter/10.1007%2F11496656_16) Tandem repeats: http://bioinformatics.oxfordjournals.org/content/18/4/634.short http://online.liebertpub.com/doi/abs/10.1089/cmb.2005.12.928 http://www.waset.org/journals/ijmbs/v6/v62.pdf 
13 
Student presentation 4/25: Hashbased reads mapping (presented by C. Chu) 4/23: RNA seq reads mapping (presented by S. Saha) 
Papers on reads mapping: "Hobbes:optimized grambased methods for efficient read
alignment"
and "Accelerating read mapping with FastHASH"
Papers on RNAseq: 
12 
Student presentation 4/18: An approximate matching algorithm by G. Myers (presented by G. Ilie). 4/16: Fast LempelZiv data compression (presented by A. Mamun). 
Paper
on approximate string matching. Slides on data compression. 
11 
No class. 

10 
4/4: Probability of strings
and sequences. 
The paper
by Chvatal and Sankoff on longest common subsequences. 
9 
3/28: Coding and
compression of text. 
A
notes written by Gusfeld on the Unique Decipherability problem. The RECOMB'10 paper about sequence reads compression. 
8 
3/14: Applications in
highthroughput sequencing. 
Two papers (paper
1 and paper
2) on genome assembly with paired reads. Pevzner, et al's paper using Eulerian path. The BWTbased reads mapping: the BWA paper 
7 
3/7: BurrowsWheeler
Transform 
Compressed suffix array:
Compressed
Suffix Arrays and Suffix Trees with Applications to Text Indexing and
String Matching Another paper on compressed suffix array The main paper covered on BWT pattern matching is: Opportunistic data structures with applications (FOCS 2000) There are many online reference on BWT. There are also books on BWT (e.g. link). 
6 
2/28: More approximate
string matching. Blast: concepts and seeding. Introduction to sequence
reads mapping. 
Sections 12.2 and 12.3. The original paper about spaced seed (Link). The algorithm for computing the probability of seed hitting a region can be found on pages 910 from Keich, et al. (Link). This algorithm is slightly different from what we covered in class, but very similar. My explanation of the algorithm related to seeding written for another class: it is a little different from what is presented this time and so is only for reference (PDF) This tutorial on spaced seeds can be useful. Sequence reads mapping: the MAQ paper. 
5 
2/21: Two more applications
of suffix trees: MUMs and mximal substrings of more than two strings.
Introeduction to sequence alignment. Approximate string matching. 
Gusfield: section 9.7,
chapter
11 (I did not go over these topics in details in class b/c most
students have already learned this but if you have not, you should
carefully studied these sections), sections 12.2 and 12.7 (I only very
briefly mentioned the fourrussians approach; see more details in the
link below). Section 4.2. Introduction to sequence alignment by Gusfield. The fourrussians writing by Gusfield. 
4 
2/14: Suffix array. More
applications of suffix tree and array: tandem repeats, longest
prefixsuffix matches for multiple strings, and maximal unqiue matches. 
Gusfield: Sections 7.14,
7.10. For reference only: my notes on suffix array (written a few years ago). Notes on LCP array The paper of the threepartition lineartime suffix array construction. Gusfield's writeup on O(nlogn) tandem repeat finding. HW2: will appear in HuskyCT. 
3 
2/7: String matching
with wildcards. Suffix tree. Applications of suffix tree. 
Gusfield: Sections 3.5,
5.15.4,
6.1, 8.18.10. 
2 
1/31: BoyerMoore
(cont.), KarpRabin and AhoCorasic. 
Gusfield: Sections 3.4, 4.1
and
4.4, Lecture slides on AhoCorasick at another institution HW1: posted on HuskyCT. 
1 
1/24:
Introduction to string matching. Z algorithm. KMP. BoyerMoore. 
Gusfield:
Chapter
1, Sections 2.12.3, Section 3.2. Also some online reference: Z algorithm Introduction to BoyerMoore 