Kwok-Pui Choi (Mathematics, NUS, Singapore)

Title: Efficient homology search: consecutive seed versus spaced seeds

With more and more genomes being completely sequenced, the need for more
efficient comparison of multimegabase genomic DNA sequences has posed a
challenge to further improve the speed of sequence alignment. As the
popular programs such as FASTA (Lipman and Pearson, 1985),
BLAST (Altschul et al., 1990; Altschul et al., 1997; Gish, 2001),
SIM (Huang and Miller, 1991) are still too computationally demanding to
align mulitmegabase sequences even in a modern computer.

A common idea among most algorithms for achieving speed in sequence
comparisons is the idea of filtration: a two-stage process. First stage preselects
a set of positions in which given sequences are potentially similar, then
the second stage verifies each possible position using accurate method
rejecting potential matches that do not satisfy some specified similarity
criteria.

In this talk, we will
(1) review some of the existing algorithms;
(2) introduce a simple yet novel idea of spaced seed due to Ma et al 2002;
(3) discuss probability related problems arising from the spaced seeds idea; and
(4) mention some partial results and some open problems.