TY - GEN

T1 - An efficient algorithm for pattern discovery in large text databases

AU - Li, Dan

AU - Wang, Kefei

AU - Deogun, Jitender S.

AU - Donis, Ruben O.

PY - 2003

Y1 - 2003

N2 - In this paper, we present novel text mining algorithms that are useful for pattern discovery in large gene sequence databases. Our approach allows us to work with a small subset of all possible patterns thus enhancing space and time complexity. We call this algorithm Generating All Frequent Patterns, GAFP. Representative subword association rules are introduced to express associations between subword patterns and user-specified target conditions. A rule is of the form P ⇒ C, where P is a subword association pattern in the form of (α1, α2, ⋯, αk,d), and C is a target condition. Pattern (α1, α2, ⋯, αk, d) is called a k-subword association pattern where αi are subwords from input text sequences, and d is the distance constraint which specifies the maximum distance between two subwords adjacent in the pattern. GAFP presents an efficient approach for computing frequent patterns that optimize the rule confidence.

AB - In this paper, we present novel text mining algorithms that are useful for pattern discovery in large gene sequence databases. Our approach allows us to work with a small subset of all possible patterns thus enhancing space and time complexity. We call this algorithm Generating All Frequent Patterns, GAFP. Representative subword association rules are introduced to express associations between subword patterns and user-specified target conditions. A rule is of the form P ⇒ C, where P is a subword association pattern in the form of (α1, α2, ⋯, αk,d), and C is a target condition. Pattern (α1, α2, ⋯, αk, d) is called a k-subword association pattern where αi are subwords from input text sequences, and d is the distance constraint which specifies the maximum distance between two subwords adjacent in the pattern. GAFP presents an efficient approach for computing frequent patterns that optimize the rule confidence.

UR - http://www.scopus.com/inward/record.url?scp=1642337790&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1642337790&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:1642337790

SN - 1932415076

T3 - Proceedings of the International Conference on Information and Knowledge Engineering

SP - 96

EP - 102

BT - Proceedings of the International Conference on Information and Knowledge Engineering 2003

A2 - Goharian, N.

A2 - Goharian, N.

T2 - Proceedings of the International Conference on Information and Knowledge Engineering 2003

Y2 - 23 June 2003 through 26 June 2003

ER -