Exploring database keyword search for association studies between genetic variants and diseases

Dhawal Verma, Hesham H. Ali, Zhengxin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Keyword search plays a critical role for researchers in bioinformatics to retrieve structured, semi-structured, and unstructured data. In addition, in order to fully exploit the rich repository of biological databases, data mining has drawn increasing attention of researchers. An interesting issue is to examine the possible relationship between database keyword search (DB KWS) and indepth database exploration (or data mining) in the context of bioinformatics, and in particular, the potential contribution of DB KWS for data mining. However, so far there is no known systematic investigation on this relationship. In this paper, we provide a preliminary discussion on how we can take advantage of DB KWS for in-depth exploration of biological databases, and describe a case study on the association between genetic variants and diseases. The case study is motivated from the fact that the advent of high throughput sequencing technologies have facilitated in generating a huge amount of genomic data. A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways and a plethora of other information such as the disease-gene associations, the variants associated with the diseases etc. A database was curated of the genome wide association studies, and an algorithm inspired by DBXplorer was used to implement the keyword search over the database in JAVA. The case study further proposes ways to include the association rule mining as a data mining technique, which is useful for discovering interesting relationships hidden in large data sets, to further investigate the results of the keyword search when done with different yet sensible combinations of disease and genes. We believe that such an integrated study to explore the potential of how bioinformatics can take advantage of both techniques in a single bioinformatics application would be a very interesting issue of both theoretical and practical importance.

Original languageEnglish (US)
Title of host publicationProcedia Computer Science
Pages206-213
Number of pages8
Volume17
DOIs
StatePublished - 2013
Event1st International Conference on Information Technology and Quantitative Management, ITQM 2013 - Suzhou, China
Duration: May 16 2013May 18 2013

Other

Other1st International Conference on Information Technology and Quantitative Management, ITQM 2013
Country/TerritoryChina
CitySuzhou
Period5/16/135/18/13

Keywords

  • Data Mining
  • Genome Wide Association Studies
  • Keyword Search

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Exploring database keyword search for association studies between genetic variants and diseases'. Together they form a unique fingerprint.

Cite this