SciMiner is a web-based literature mining and functional analysis tool that identifies genes and proteins using a context specific analysis of MEDLINE abstracts and full texts. SciMiner accepts a free text query (PubMed Entrez search) or a list of PubMed identifiers as input. SciMiner uses both regular expression patterns and dictionaries of gene symbols and names compiled from multiple sources. Ambiguous acronyms are resolved by a scoring scheme based on the co-occurrence of acronyms and corresponding description terms, which incorporates optional user-defined filters. Functional enrichment analyses are used to identify highly relevant targets (genes and proteins), GO (Gene Ontology) terms, MeSH (Medical Subject Headings) terms, pathways and protein-protein interaction networks by comparing identified targets from one search result with those from other searches or to the full HGNC [HUGO (Human Genome Organization) Gene Nomenclature Committee] gene set. The performance of gene-protein name identification was evaluated using the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) version 2 (Year 2006) Gene Normalization Task as a gold standard. SciMiner achieved 87.1% recall, 71.3% precision and 75.8% F-measure. SciMiner's literature mining performance coupled with functional enrichment analyses provides an efficient platform for retrieval and summary of rich biological information from corpora of users' interests.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics