The assessment of inter-rater reliability is a topic that is infrequently addressed in Caenorhabditis elegans research, despite the existence of sophisticated statistical methods and the strong interest in the field in obtaining reliable and accurate data. This study applies statistical modeling as a robust means of analyzing the performance of worm researchers measuring the stage of worm development in terms of the two independent factors that comprise "agreement", which are (1) accuracy, representing trueness, a lack of systematic differences, or lack of bias, and (2) precision, representing reliability or the extent to which random differences are small. In our study, multiple raters assessed the same sample of worms to determine the developmental stage of each animal, and we collected data linking each scorer with their assessment for each worm. To describe the agreement of the raters, we developed a structural equation model with latent variables and thresholds, which assumes that all the raters are jointly scoring each worm. This common factor model separately quantifies the two aspects of agreement. The stage-specific thresholds examine accuracy and characterize the relative biases of each rater during the scoring process. The factor loadings for each rater examine the precision and characterizes the random error of the rater. Within our group, we found that the overall agreement was good, while certain adjustments in particular raters would have decreased systematic differences. Hence, the use of developmental stage as an experimental outcome can be both accurate and precise.
ASJC Scopus subject areas