Conference Proceedings

An efficient technique for mining approximately frequent substring patterns

X Ji, J Bailey

Proceedings - IEEE International Conference on Data Mining, ICDM | Published : 2007


Sequential patterns are used to discover knowledge in a wide range of applications. However, in many scenarios pattern quality can be low, due to short lengths or low supports. Furthermore, for dense datasets such as proteins, most of the sequential pattern mining algorithms return a tremendously large number of patterns, which are difficult to process and analyze. However, by relaxing the definition of frequency and allowing some mismatches, it is possible to discover higher quality patterns. We call these patterns Frequent Approximate Substrings or FAS-patterns and we introduce an algorithm called FAS-Miner, to handle the mining task efficiently. The experiments on real-world protein and D..

View full abstract

Citation metrics