Comparing the genomes of different species—or different members of the same species—is the basis of a great deal of modern biology. DNA sequences that are conserved across species are likely to be functionally important, while variations between members of the same species can indicate different susceptibilities to disease.
The basic algorithm for determining how much two sequences of symbols have in common—the “edit distance” between them—is now more than 40 years old. And for more than 40 years, computer science researchers have been trying to improve upon it, without much success.
At the ACM Symposium on Theory of Computing (STOC) next week, MIT researchers will report that, in all likelihood, that’s because the algorithm is as good as it gets. If a widely held assumption about computational complexity is correct, then the problem of measuring the difference between two genomes—or texts, or speech samples, or anything else that can be represented as a string of symbols—can’t be solved more efficiently.
In a sense, that’s disappointing, since a computer running the existing algorithm would take 1,000 years to exhaustively compare two human genomes. But it also means that computer scientists can stop agonizing about whether they can do better.
“This edit distance is something that I’ve been trying to get better algorithms for since I was a graduate student, in the mid-’90s,” says Piotr Indyk, a professor of computer science and engineering at MIT and a co-author of the STOC paper. “I certainly spent lots of late nights on that—without any progress whatsoever. So at least now there’s a feeling of closure. The problem can be put to sleep.
” Moreover, Indyk says, even though the paper hasn’t officially been presented yet, it’s already spawned two follow-up papers, which apply its approach to related problems. “There is a technical aspect of this paper, a certain gadget construction, that turns out to be very useful for other purposes as well,” Indyk says. Source: 40-year-old algorithm proven the best possible