5/2/: Approximate string matching (presented by L.Nazaryan )
4/30: Extension to BWT (presented by R. Jiang), and tandem repeat finding (presented by S. Mirzaei)
|Extension to BWT:
An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression (http://link.springer.com/chapter/10.1007%2F11496656_16)
4/25: Hash-based reads mapping (presented by C. Chu)
4/23: RNA seq reads mapping (presented by S. Saha)
|Papers on reads mapping:
"Hobbes:optimized gram-based methods for efficient read alignment"
and "Accelerating read mapping with FastHASH"
Papers on RNA-seq:
4/18: An approximate matching algorithm by G. Myers (presented by G. Ilie).
4/16: Fast Lempel-Ziv data compression (presented by A. Mamun).
on approximate string matching.
Slides on data compression.
||4/4: Probability of strings
by Chvatal and Sankoff on longest common subsequences.
||3/28: Coding and
compression of text.
notes written by Gusfeld on the Unique Decipherability problem.
The RECOMB'10 paper about sequence reads compression.
||3/14: Applications in
||Two papers (paper
1 and paper
2) on genome assembly with paired reads.
Pevzner, et al's paper using Eulerian path.
The BWT-based reads mapping: the BWA paper
||Compressed suffix array:
Suffix Arrays and Suffix Trees with Applications to Text Indexing and
Another paper on compressed suffix array
The main paper covered on BWT pattern matching is: Opportunistic data structures with applications (FOCS 2000)
There are many online reference on BWT. There are also books on BWT (e.g. link).
||2/28: More approximate
string matching. Blast: concepts and seeding. Introduction to sequence
||Sections 12.2 and 12.3.
The original paper about spaced seed (Link).
The algorithm for computing the probability of seed hitting a region can be found on pages 9-10 from Keich, et al. (Link). This algorithm is slightly different from what we covered in class, but very similar.
My explanation of the algorithm related to seeding written for another class: it is a little different from what is presented this time and so is only for reference (PDF)
This tutorial on spaced seeds can be useful.
Sequence reads mapping: the MAQ paper.
||2/21: Two more applications
of suffix trees: MUMs and mximal substrings of more than two strings.
Introeduction to sequence alignment. Approximate string matching.
||Gusfield: section 9.7,
11 (I did not go over these topics in details in class b/c most
students have already learned this but if you have not, you should
carefully studied these sections), sections 12.2 and 12.7 (I only very
briefly mentioned the four-russians approach; see more details in the
link below). Section 4.2.
Introduction to sequence alignment by Gusfield.
The four-russians writing by Gusfield.
||2/14: Suffix array. More
applications of suffix tree and array: tandem repeats, longest
prefix-suffix matches for multiple strings, and maximal unqiue matches.
||Gusfield: Sections 7.14,
For reference only: my notes on suffix array (written a few years ago).
Notes on LCP array
The paper of the three-partition linear-time suffix array construction.
Gusfield's writeup on O(nlogn) tandem repeat finding.
HW2: will appear in HuskyCT.
||2/7: String matching
with wildcards. Suffix tree. Applications of suffix tree.
||Gusfield: Sections 3.5,
(cont.), Karp-Rabin and Aho-Corasic.
||Gusfield: Sections 3.4, 4.1
Lecture slides on Aho-Corasick at another institution
HW1: posted on HuskyCT.
Introduction to string matching. Z algorithm. KMP. Boyer-Moore.
1, Sections 2.1-2.3, Section 3.2. Also some online reference:
Introduction to Boyer-Moore