Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings

Yashu Kang, Aemal Khattak

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


The presence of unobserved heterogeneity in crash data can result in estimation of biased model parameters and incorrect inferences. The research presented in this paper investigated severity of crashes reported at highway–rail grade crossings by appropriately clustering the data, accounting for unobserved heterogeneity. A combination of data mining and statistical regression methods was used to cluster crash data into subsets and then to identify factors associated with crash injury severity levels. This research relied on highway–rail accident, incident, and crossing inventory databases for 2011 to 2015 obtained from FRA. Three clustering methods—K-means, traditional latent class cluster, and variational Bayesian latent class cluster—were considered, and the variational Bayesian latent class cluster method was chosen for partitioning the data set for model estimation. Unclustered data as well as the clustered subsets were used to estimate ordered logit models for crash injury severity. A comparison revealed that the cluster-based approach provided more relevant model parameters and identified factors relevant only to certain clusters of the data.

Original languageEnglish (US)
Pages (from-to)58-69
Number of pages12
JournalTransportation Research Record
Issue number1
StatePublished - 2017
Externally publishedYes

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Mechanical Engineering


Dive into the research topics of 'Cluster-based approach to analyzing crash injury severity at highway–rail grade crossings'. Together they form a unique fingerprint.

Cite this