TY - GEN
T1 - Using term extraction patterns to discover coherent relationships from open source intelligence
AU - Sousan, William L.
AU - Zhu, Qiuming
AU - Gandhi, Robin
AU - Mahoney, William
AU - Sharma, Anup
PY - 2010
Y1 - 2010
N2 - Unstructured open source information, especially the social, political, economic and cultural events described within web-based text/news articles, often contain possible motives for cyber security and trust issues. Automated processing of numerous open source intelligence sources requires the discovery of key domain terms, their conceptual hierarchies and the coherent relationships among them. A syntactic analysis of the word sequences in unstructured text documents allows for the extraction of subject-predicate-object triples, which form the basis for Term Extraction Patterns (TEP). In our research, we use TEPs to discover domain-specific multi-word entities which in turn, can be arranged in a taxonomy based on their semiotic inter-relationships. We explore the use of this method within the cyber security domain and analyze a collection of related news articles gathered from various public web sources. In this paper our initial results of term extraction and the semantic coherence derived from the TEP analyses are described. Our work extends beyond current methods, and our contribution is a novel methodology to extract semantics from unstructured text in domain specific open source information and its application to predict cyber attack outbreaks.
AB - Unstructured open source information, especially the social, political, economic and cultural events described within web-based text/news articles, often contain possible motives for cyber security and trust issues. Automated processing of numerous open source intelligence sources requires the discovery of key domain terms, their conceptual hierarchies and the coherent relationships among them. A syntactic analysis of the word sequences in unstructured text documents allows for the extraction of subject-predicate-object triples, which form the basis for Term Extraction Patterns (TEP). In our research, we use TEPs to discover domain-specific multi-word entities which in turn, can be arranged in a taxonomy based on their semiotic inter-relationships. We explore the use of this method within the cyber security domain and analyze a collection of related news articles gathered from various public web sources. In this paper our initial results of term extraction and the semantic coherence derived from the TEP analyses are described. Our work extends beyond current methods, and our contribution is a novel methodology to extract semantics from unstructured text in domain specific open source information and its application to predict cyber attack outbreaks.
KW - Conceptualization
KW - Open source intelligence
KW - Semantic relevance
KW - Term extraction
KW - Term extraction patterns
UR - http://www.scopus.com/inward/record.url?scp=78649297645&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649297645&partnerID=8YFLogxK
U2 - 10.1109/SocialCom.2010.143
DO - 10.1109/SocialCom.2010.143
M3 - Conference contribution
AN - SCOPUS:78649297645
SN - 9780769542119
T3 - Proceedings - SocialCom 2010: 2nd IEEE International Conference on Social Computing, PASSAT 2010: 2nd IEEE International Conference on Privacy, Security, Risk and Trust
SP - 967
EP - 972
BT - Proceedings - SocialCom 2010
T2 - 2nd IEEE International Conference on Social Computing, SocialCom 2010, 2nd IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2010
Y2 - 20 August 2010 through 22 August 2010
ER -