TY - GEN
T1 - MeSH Indexing Using the Biomedical Citation Network
AU - Gasper, William
AU - Chundi, Parvathi
AU - Ghersi, Dario
N1 - Funding Information:
This work was supported by the Office of Research and Creative Activity at the University of Nebraska at Omaha. Thanks to Laura Sherwin for assistance with proposal writing.
Publisher Copyright:
© 2020 ACM.
PY - 2020/9/21
Y1 - 2020/9/21
N2 - PubMed contains over 30 million biomedical literature citations and is an invaluable resource for researchers, medical professionals, students, and curious individuals. The search and retrieval process is significantly enhanced by PubMed's Medical Subject Heading (MeSH) indexing process, which requires a significant manual component. It is difficult to effectively apply traditional machine learning methods to large scale semantic indexing problems, and this difficulty has impeded complete automation of the MeSH indexing process. PubMed citations are particularly challenging to index: documents are often indexed with a dozen or more terms, and most terms occur extremely infrequently in the document set. This work examines the biomedical literature citation network and MeSH vocabulary for viable signal that might benefit the indexing process. Simple predictive models utilizing features generated from the biomedical literature citation network proved useful and effective in recommending MeSH terms for document indexing. A neural network proved similarly effective to the simple model in terms of raw performance but produced qualitatively different term recommendations.
AB - PubMed contains over 30 million biomedical literature citations and is an invaluable resource for researchers, medical professionals, students, and curious individuals. The search and retrieval process is significantly enhanced by PubMed's Medical Subject Heading (MeSH) indexing process, which requires a significant manual component. It is difficult to effectively apply traditional machine learning methods to large scale semantic indexing problems, and this difficulty has impeded complete automation of the MeSH indexing process. PubMed citations are particularly challenging to index: documents are often indexed with a dozen or more terms, and most terms occur extremely infrequently in the document set. This work examines the biomedical literature citation network and MeSH vocabulary for viable signal that might benefit the indexing process. Simple predictive models utilizing features generated from the biomedical literature citation network proved useful and effective in recommending MeSH terms for document indexing. A neural network proved similarly effective to the simple model in terms of raw performance but produced qualitatively different term recommendations.
UR - http://www.scopus.com/inward/record.url?scp=85096951957&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096951957&partnerID=8YFLogxK
U2 - 10.1145/3388440.3412466
DO - 10.1145/3388440.3412466
M3 - Conference contribution
AN - SCOPUS:85096951957
T3 - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
BT - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
PB - Association for Computing Machinery, Inc
T2 - 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
Y2 - 21 September 2020 through 24 September 2020
ER -