Yufeng Wu

235 ITEB, ywu@engr.uconn.edu

Office hours: ITEB 235, Tuesday/Thursday 9:30 am -12 pm or by appointment.

This course is an algorithmic course. Most of the topics will be about combinatorial

algorithms on string processing. The goal is to survey the field of string algorithms

by covering important algorithmic ideas in string processing. We will also discuss

applications of string algorithms, especially in bioinformatics. However, the foucs

of this course will be on algorithms.

This course is lecture-based. Students are required to read and present a research

paper in string algorithms and their application. Each student should also perform

some empirical study by implementing some string algorithms.

In particular, the planned subjects are:

1) Classic exact string matching algorithms and applications. Topics include:

Knuth–Morris–Pratt, Boyer-Moore, Karp-Rabin, Aho-Corasick, suffix trees

and suffix arrays.

2) Extension to the classic string algorithms: approximate string matching, extensions

for the basic sequence alignment, other string match heuristics (e.g. Blast).

Probabilistic models of strings and patterns.

3) Burrows-Wheeler transform. Algorithms in data compression and coding.

4) Applications in bioinformatics. Topics may include reads mapping in high-throughput

sequencing and genome assembly.

Prerequisites. As for background, essentially no biology is assumed. Some knowledge

of probability may help. The most relevant background is a graduate course on algorithms,

but a serious student who has only had a undergraduate algorithm course, or a smart,

mathematically mature student who has had neither, might also be able to follow the course.

Textbook: The following book is required. At least half of the lectures will be based on this book.

We will also cover some topics outside this book.

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

by Dan Gusfield, 1997. An excellent survey of string algorithms and bioinformatics applications.

Homework. There will be written homework assignments from time to time.

Presentation. Each student needs to select a particular subject in string algorithms and applications

in bioinformatics to present to the class. The student should contact the instructor about the

topic/paper first. I prefer the presentaiton that provides some interesting algorithmic/technical aspects.

Projects. Each student should do a project on string algorithms. Then, each student needs to write a

project report that summerizes the findings. I prefer that each student works on his/her own project

but team projects can be made with permission. There are different kinds of projects.

Ideally, a student will choose to conduct some research: design a faster algorithm for some

string algorithmic problems, develop some new algorithms for solving some practical problems (e.g. in

bioinformatics) or develop some theoretical analysis of some string algorithmic problem.

Again, each project needs to be first approved by the instructor.

Exams. Currently I plan to have a final exam in this course. I believe this will help me to see how much

the students in the class learn the materials.

Grading. The grade will be assigned by: homework (25%), project (30%), paper presentation (10%)

and final exam (35%). This is based on the current plan that there will be a final exam.