CSE 5095: String Algorithms and
- Spring 2013
235 ITEB, email@example.com
Office hours: ITEB 235, Tuesday/Thursday 9:30 am -12 pm or by
This course is an algorithmic
course. Most of the topics will be about combinatorial
algorithms on string processing. The goal is to survey the field of
by covering important algorithmic ideas in string processing. We will
applications of string algorithms, especially in bioinformatics.
However, the foucs
of this course will be on algorithms.
This course is lecture-based. Students are required to read and present
paper in string algorithms and their application. Each student should
some empirical study by implementing some string algorithms.
In particular, the planned subjects are:
1) Classic exact string matching algorithms and applications. Topics
Knuth–Morris–Pratt, Boyer-Moore, Karp-Rabin, Aho-Corasick, suffix trees
and suffix arrays.
2) Extension to the classic string algorithms: approximate string
for the basic sequence alignment, other string match heuristics (e.g.
Probabilistic models of strings and patterns.
3) Burrows-Wheeler transform. Algorithms
in data compression and coding.
4) Applications in bioinformatics. Topics may include reads mapping in
sequencing and genome assembly.
Prerequisites. As for background,
essentially no biology is assumed. Some knowledge
of probability may help. The most relevant background is a graduate
course on algorithms,
but a serious student
who has only had a undergraduate algorithm
course, or a smart,
mathematically mature student who has had neither, might
also be able to follow the course.
The following book is required. At least half of the lectures will be
based on this book.
We will also cover some topics outside this book.
Algorithms on Strings, Trees and Sequences: Computer
Science and Computational Biology
by Dan Gusfield, 1997. An
excellent survey of string algorithms and bioinformatics applications.
There will be written homework assignments from time to time.
Each student needs to select a particular subject in string algorithms
in bioinformatics to present to the class. The student should contact
instructor about the
topic/paper first. I prefer the presentaiton that provides some
interesting algorithmic/technical aspects.
Projects. Each student should do a
project on string algorithms. Then,
each student needs to write a
report that summerizes the findings.
prefer that each student works on his/her own project
but team projects
can be made with permission. There
are different kinds of projects.
Ideally, a student will choose to conduct some research: design a
faster algorithm for some
string algorithmic problems, develop some new algorithms for solving
some practical problems (e.g. in
bioinformatics) or develop some
theoretical analysis of some string algorithmic problem.
each project needs to be first approved by the instructor.
Currently I plan to have a final exam in this course. I believe this
will help me to see how much
the students in the class learn the materials.
Grading. The grade will be
by: homework (25%), project (30%), paper presentation
and final exam (35%). This is based on the current plan that there will
be a final exam.