Title: Efficient homology search: consecutive seed versus spaced seeds
With more and more genomes being
completely sequenced, the need for more
efficient comparison of multimegabase
genomic DNA sequences has posed a
challenge to further improve the
speed of sequence alignment. As the
popular programs such as FASTA
(Lipman and Pearson, 1985),
BLAST (Altschul et al., 1990; Altschul
et al., 1997; Gish, 2001),
SIM (Huang and Miller, 1991) are
still too computationally demanding to
align mulitmegabase sequences even
in a modern computer.
A common idea among most algorithms
for achieving speed in sequence
comparisons is the idea of filtration:
a two-stage process. First stage preselects
a set of positions in which given
sequences are potentially similar, then
the second stage verifies each
possible position using accurate method
rejecting potential matches that
do not satisfy some specified similarity
criteria.
In this talk, we will
(1) review some of the existing
algorithms;
(2) introduce a simple yet novel
idea of spaced seed due to Ma et al 2002;
(3) discuss probability related
problems arising from the spaced seeds idea; and
(4) mention some partial results
and some open problems.