||11/19: Brief introduction to sorting by
11/17: Unique decipherability
chapter by P. Pevzner on genome rearrangement by
A notes written by Gusfeld on the Unique Decipherability problem.
||11/12: Text compression
11/10: Sequence error correction
|Papers and links to text compression:
ref 1, ref
2 and ref
Papers on sequence reads error correction: paper 1, paper 2 and paper 3.
||11/5: k-mer counting
11/3: Genome assembly
|Three papers on k-mer counting covered in
2, and paper
Genome assembly from pair-end reads
The IDBA paper
||10/29: Sequencing data analysis: genome
10/27: Sequencing data analysis: reads mapping
et al's paper using Eulerian path.
The BWT-based reads mapping: the BWA paper
Sequence reads mapping: the MAQ paper.
|Proposals for paper presentation and project
||10/22: Blast and Pattern Hunter.
See the Notes.
10/20: Approximate string matching.
The original paper about spaced seed (Link).
The algorithm for computing the probability of seed hitting a region can be found on pages 9-10 from Keich, et al. (Link). This algorithm is slightly different from what we covered in class, but very similar.
My explanation of the algorithm related to seeding: this is the algorithm presented in the class (PDF)
This tutorial on spaced seeds can be useful.
||10/15: Approximate string matching
10/13: Compressed suffix array and a little 2D string matching
|Gusfield: 4.2 and 12.2.
This web page explains the 2-dimensional string matching.
Basic dynamic programming for comparing two strings. If you have not learned the basic DP on string comparison, you should carefully read it.
Compressed suffix array: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
Another paper on compressed suffix array. I only covered a small part of the first CSA paper and none of the second paper. It would be good if someone in the class can tell us more about CSA.
|HW4 is posted in HuskyCT. Due: 10/27.|
||9/29: Pattern match with BWT
The main paper covered on BWT pattern matching is: Opportunistic
structures with applications (FOCS 2000)
There are many online reference on BWT. There are also books on BWT (e.g. link).
||9/29: Suffix array
9/29: Suffix array
Gusfield: Sections 7.14, 7.10.
The paper of the three-partition linear-time suffix array construction.
For reference only: my notes on suffix array (written a few years ago).
Gusfield's notes on LCP array
|HW3 is posted in HuskyCT. Due: 10/13.
||9/24: More applications of suffix tree
9/22: Applications of suffix tree
|Gusfield: Chapters 7 and 9. I could not cover
the entire chapter. But it is still worthy of reading.
Gusfield's writeup on O(nlogn) tandem repeat finding.
|HW2. Due: 10/1.|
||9/17: Suffix tree.
9/15: Aho-Corasick algorithm
|Gusfield: Section 3.4. Chapter 5. Section
An introduction to suffix tree by Dan Gusfield (PDF).
The writing by Gusfield on Ukkonen's algorithm (PDF).
||9/10: Karp-Rabin and Aho-Corasick
9/8:Boyer-Moore algorithm and the linear time analysis
|Gusfield: Sections 3.4 and 4.4.
Gusfield: Section 3.2. Some links that might be useful:
Classic string matching algorithms: KMP and Boyer-Moore.
9/1: Introduction to string matching. Different kinds of string matching/comparison algorithms. Z algorithm.
Chapter 1, Sections 2.1-2.3, Section 3.2. Also some online
Introduction to Knuth-Morris-Pratt algorithm
Introduction to Boyer-Moore
Basic dynamic programming for comparing two strings.
|HW1. Due: 9/15.
Please submit electronically in HuskyCT. Please consider using LATEX for writing up your solutions.