CSE 5840: String Algorithms and
Applications in Bioinformatics - Fall 2015
235 ITEB, email@example.com
Office hours: ITEB 235, Monday/Wednesday 2:00 pm to 4:30 pm or by appointment.
This course is an algorithmic
course. Most of the topics will be about combinatorial
algorithms on string processing. The goal is to survey the field
of string algorithms
by covering important algorithmic ideas in string processing. We
will also discuss
applications of string algorithms in bioinformatics, especially in
analyzing high throughput
sequencing data. Sequence data analysis has been a major
application of string
algorithm. I expect to cover various types of problems in
analyzing sequence data.
This course is lecture-based. Students are required to read and
present a research
paper in string algorithms and their applications. Each student
should also perform
some empirical study by implementing some string algorithms.
In particular, the planned subjects are:
1) Classic exact string matching algorithms and applications.
Knuth–Morris–Pratt, Boyer-Moore, Aho-Corasick, suffix trees and
2) Extension to the classic string algorithms: approximate string
matching, multiple sequence
alignment, other string match heuristics (e.g. Blast).
3) Burrows-Wheeler transform. Algorithms in data
compression and coding.
4) Applications in bioinformatics. Topics may include reads
mapping in high-throughput
sequencing, genome assembly, genetic variation calling, and
Prerequisites. As for
background, essentially no biology is assumed. The most relevant
background is a graduate course on algorithms, but a serious
student who has only had a
undergraduate algorithm course, or a smart, mathematically
mature student who has had neither,
might also be able to follow the course.
Textbook: The following book is recommended
but not required. We will
cover some topics from
this book. But the majority of topics will outside this book.
I will try to post relevant materials/links
on these topics.
Algorithms on Strings, Trees and
Sequences: Computer Science and Computational Biology
by Dan Gusfield, 1997. An excellent survey of string algorithms and
There will be written homework assignments from time to time.
Presentation. Each student needs to select a
particular subject in string algorithms and applications
in bioinformatics to present to the class. The student should
contact the instructor about the
topic/paper first. I prefer the presentation that provides some
interesting algorithmic/technical aspects.
student should do a project on string algorithms. Then, each student needs to write a
project report that summarizes the findings. I prefer that each student works on
his/her own project
but team projects can be made with permission. There are different kinds of projects.
Ideally, a student will choose to conduct some research: design a
faster algorithm for some
string algorithmic problems, develop some new algorithms for
solving some practical problems (e.g. in
analyzing sequence data) or develop some theoretical analysis of some string
Alternatively, one may evaluate empirically the performance of
string processing algorithms.
Again, each project needs to
be first approved by the instructor.
Exams. Currently I have not decided whether to
hold a final exam in this course. If I do, it is likely to
a take-home exam.
Grading. The grade will be
assigned by: homework (25%), project (30%), paper presentation
and final exam (35%). This is based on the assumption that there
will be a final exam.