Generalizability of automated scores of writing quality in grades 3-5

Joshua Wilson, Dandan Chen, Micheal P. Sandbank, Michael Hebert

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


The present study examined issues pertaining to the reliability of writing assessment in the elementary grades, and among samples of struggling and nonstruggling writers. The present study also extended nascent research on the reliability and the practical applications of automated essay scoring (AES) systems in Response to Intervention frameworks aimed at preventing and remediating writing difficulties (RTI-W). Students in Grade 3 (n = 185), Grade 4 (n = 192), and Grade 5 (n = 193) responded to six writing prompts, two prompts each in the three genres emphasized in the Common Core and similar "Next Generation" academic standards: narrative, informative, and persuasive. Prompts were scored using an AES system called Project Essay Grade (PEG). Generalizability theory was used to examine the following sources of variation in PEG's quality scores: prompts, genres, and the interaction among those facets and the object of measurement: students. Separate generalizability and decision studies were conducted for each grade level and for subsamples of nonstruggling and struggling writers identified using a composite measure of writing skill. Low-stakes decisions (reliability ≥ .80) could be made by averaging scores from a single prompt per genre (i.e., 3 total) or 2 prompts per genre if administered to struggling writers (i.e., 6 total). High-stakes decisions (reliability ≥ .90) could be made by averaging across two prompts per genre (6 total) or 4-5 prompts per genre if administered to struggling writers (12-15 total). Implications for use of AES within RTI-W and the construct validity of AES writing quality scores are discussed.

Original languageEnglish (US)
Pages (from-to)619-640
Number of pages22
JournalJournal of Educational Psychology
Issue number4
StatePublished - May 2019


  • Assessment
  • Automated essay scoring
  • Generalizability
  • Struggling writers
  • Writing

ASJC Scopus subject areas

  • Education
  • Developmental and Educational Psychology


Dive into the research topics of 'Generalizability of automated scores of writing quality in grades 3-5'. Together they form a unique fingerprint.

Cite this