Conference Proceedings

Improving suffix array locality for fast pattern matching on disk

R Sinha, SJ Puglisi, A Moffat, A Turpin

Proceedings of the ACM SIGMOD International Conference on Management of Data | Published : 2008

Abstract

The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings, such as those arising in computational biology. Here we address the problem of arranging a suffix array on disk so that querying is fast in practice. We show that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk-based suffix array arrangement. Construction of our data structure requires only modest processing time on top of that required to build the suffix tree, and requires negligible extra memory. C..

View full abstract