Phase II Evaluation of Clinical Coding Schemes: Completeness, Taxonomy, Mapping, Definitions, and Clarity

James R. Campbell, Paul Carpenter, Charles Sneiderman, Simon Cohn, Christopher G. Chute, Judith Warren

Research output: Contribution to journalArticlepeer-review

130 Scopus citations


Objective: To compare three potential sources of controlled clinical terminology (READ codes version 3.1, SNOMED International, and Unified Medical Language System (UMLS) version 1.6) relative to attributes of completeness, clinical taxonomy, administrative mapping, term definitions and clarity (duplicate coding rate). Methods: The authors assembled 1929 source concept records from a variety of clinical information taken from four medical centers across the United States. The source data included medical as well as ample nursing terminology. The source records were coded in each scheme by an investigator and checked by the coding scheme owner. The codings were then scored by an independent panel of clinicians for acceptability. Codes were checked for definitions provided with the scheme. Codes for a random sample of source records were analyzed by an investigator for "parent" and "child" codes within the scheme. Parent and child pairs were scored by an independent panel of medical informatics specialists for clinical acceptability. Administrative and billing code mapping from the published scheme were reviewed for all coded records and analyzed by independent reviewers for accuracy. The investigator for each scheme exhaustively searched a sample of coded records for duplications. Results: SNOMED was judged to be significantly more complete in coding the source material than the other schemes (SNOMED* 70%; READ 57%; UMLS 50%; *p < .00001). SNOMED also had a richer clinical taxonomy judged by the number of acceptable first-degree relatives per coded concept (SNOMED* 4.56; UMLS 3.17; READ 2.14, *p < .005). Only the UMLS provided any definitions; these were found for 49% of records which had a coding assignment. READ and UMLS had better administrative mappings (composite score: READ* 40.6%; UMLS* 36.1%; SNOMED 20.7%, *p <. 00001), and SNOMED had substantially more duplications of coding assignments (duplication rate: READ 0%; UMLS 4.2%; SNOMED* 13.9%, *p <. 004) associated with a loss of clarity. Conclusion: No major terminology source can lay claim to being the ideal resource for a computer-based patient record. However, based upon this analysis of releases for April 1995, SNOMED International is considerably more complete, has a compositional nature and a richer taxonomy. It suffers from less clarity, resulting from a lack of syntax and evolutionary changes in its coding scheme. READ has greater clarity and better mapping to administrative schemes (ICD-10 and OPCS-4), is rapidly changing and is less complete. UMLS is a rich lexical resource, with mappings to many source vocabularies. It provides definitions for many of its terms. However, due to the varying granularities and purposes of its source schemes, it has limitations for representation of clinical concepts within a computer-based patient record.

Original languageEnglish (US)
Pages (from-to)238-251
Number of pages14
JournalJournal of the American Medical Informatics Association
Issue number3
StatePublished - 1997

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Phase II Evaluation of Clinical Coding Schemes: Completeness, Taxonomy, Mapping, Definitions, and Clarity'. Together they form a unique fingerprint.

Cite this