On the impact of data integration and edge enrichment in mining significant signals from biological networks

Sean West, Hesham Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The influx of high-throughput biotechnologies has resulted in considerable amounts of available and untapped data, useful for both interpretation and extrapolation. Due to the fact that the noise to signal ratio in most biological databases are non-trivial, single source analysis techniques may suffer from relatively high falsepositive and false-negative rates. In addition, use of a single data source does not allow for the discovery of the novel relationships that can only be derived from multiple sources. Recently, the use of gene correlation networks has emerged to assist in the discovery of previously unknown genetic relationships and the identification of significant biological functions. Such networks provide a useful mechanism to model experimental results obtained from expression data and capture a snapshot of the expression as well as the temporal changes in various experiments. In addition, gene Ontology is often integrated with biological networks within the analysis process as a source of domain knowledge. In this project, we evaluate the use of Gene Ontology, not simply as an assessment tool, but as a basic component in building the correlation networks. We implemented a network integration algorithm that uses both gene expression data (experimental knowledge) and gene ontology data (domain knowledge) to build a biologically-rich correlation model. Then, we analyzed the resulting networks for topological changes and biological significance changes. Our main hypothesis is that the integrated networks would reduce the harmful effects of outliers from imperfect data while maintaining the high concentration of network substructures that are likely to reveal novel, biologically-significant relationships. In addition, using the concept of "guilt by association", we analyzed the clusters of the integrated networks and found that there was a significant increase of enrichment scores relative to the original networks. We show, through motif and pathway analysis, that integrated networks tend to cluster with higher biological significance.

Original languageEnglish (US)
Title of host publicationACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages760-767
Number of pages8
ISBN (Electronic)9781450328944
DOIs
StatePublished - Sep 20 2014
Event5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014 - Newport Beach, United States
Duration: Sep 20 2014Sep 23 2014

Publication series

NameACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
CountryUnited States
CityNewport Beach
Period9/20/149/23/14

Keywords

  • Co-regulation
  • Correlation networks
  • Data integration
  • Gene expression
  • Gene ontology
  • Hubs and clusters

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications
  • Software
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'On the impact of data integration and edge enrichment in mining significant signals from biological networks'. Together they form a unique fingerprint.

  • Cite this

    West, S., & Ali, H. (2014). On the impact of data integration and edge enrichment in mining significant signals from biological networks. In ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 760-767). (ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics). Association for Computing Machinery, Inc. https://doi.org/10.1145/2649387.2660846