N2 - In this paper, we present novel text mining algorithms that are useful for pattern discovery in large gene sequence databases. Our approach allows us to work with a small subset of all possible patterns thus enhancing space and time complexity. We call this algorithm Generating All Frequent Patterns, GAFP. Representative subword association rules are introduced to express associations between subword patterns and user-specified target conditions. A rule is of the form P ⇒ C, where P is a subword association pattern in the form of (α1, α2, ⋯, αk,d), and C is a target condition. Pattern (α1, α2, ⋯, αk, d) is called a k-subword association pattern where αi are subwords from input text sequences, and d is the distance constraint which specifies the maximum distance between two subwords adjacent in the pattern. GAFP presents an efficient approach for computing frequent patterns that optimize the rule confidence.

