TY - GEN
T1 - An efficient algorithm for pattern discovery in large text databases
AU - Li, Dan
AU - Wang, Kefei
AU - Deogun, Jitender S.
AU - Donis, Ruben O.
PY - 2003
Y1 - 2003
N2 - In this paper, we present novel text mining algorithms that are useful for pattern discovery in large gene sequence databases. Our approach allows us to work with a small subset of all possible patterns thus enhancing space and time complexity. We call this algorithm Generating All Frequent Patterns, GAFP. Representative subword association rules are introduced to express associations between subword patterns and user-specified target conditions. A rule is of the form P ⇒ C, where P is a subword association pattern in the form of (α1, α2, ⋯, αk,d), and C is a target condition. Pattern (α1, α2, ⋯, αk, d) is called a k-subword association pattern where αi are subwords from input text sequences, and d is the distance constraint which specifies the maximum distance between two subwords adjacent in the pattern. GAFP presents an efficient approach for computing frequent patterns that optimize the rule confidence.
AB - In this paper, we present novel text mining algorithms that are useful for pattern discovery in large gene sequence databases. Our approach allows us to work with a small subset of all possible patterns thus enhancing space and time complexity. We call this algorithm Generating All Frequent Patterns, GAFP. Representative subword association rules are introduced to express associations between subword patterns and user-specified target conditions. A rule is of the form P ⇒ C, where P is a subword association pattern in the form of (α1, α2, ⋯, αk,d), and C is a target condition. Pattern (α1, α2, ⋯, αk, d) is called a k-subword association pattern where αi are subwords from input text sequences, and d is the distance constraint which specifies the maximum distance between two subwords adjacent in the pattern. GAFP presents an efficient approach for computing frequent patterns that optimize the rule confidence.
UR - http://www.scopus.com/inward/record.url?scp=1642337790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1642337790&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:1642337790
SN - 1932415076
T3 - Proceedings of the International Conference on Information and Knowledge Engineering
SP - 96
EP - 102
BT - Proceedings of the International Conference on Information and Knowledge Engineering 2003
A2 - Goharian, N.
A2 - Goharian, N.
T2 - Proceedings of the International Conference on Information and Knowledge Engineering 2003
Y2 - 23 June 2003 through 26 June 2003
ER -