Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery

Jiating Li, Arun Narenthiran Veeranampalayam-Sivakumar, Madhav Bhatta, Nicholas D. Garst, Hannah Stoll, P. Stephen Baenziger, Vikas Belamkar, Reka Howard, Yufeng Ge, Yeyin Shi

Research output: Contribution to journalArticlepeer-review

35 Scopus citations


Background: Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm. Results: Both selection algorithms assigned the highest importance score to the variables related with plant height around the grain filling stage. Some vegetation indices related variables were also selected by the algorithms mainly at earlier to mid growth stages and during the senescence. Compared with the yield prediction using all 172 variables derived from measured phenotypes, using the selected variables performed comparable or even better. We also noticed that the prediction accuracy on the adapted NE lines (r = 0.58-0.81) was higher than the other lines (r = 0.21-0.59) included in this study with different genetic backgrounds. Conclusions: With the ultra-high resolution plot imagery obtained by the UAS-based phenotyping we are now able to derive more features, such as the variation of plant height or vegetation indices within a plot other than just an averaged number, that are potentially very useful for the breeding purpose. However, too many features or variables can be derived in this way. The promising results from this study suggests that the selected set from those variables can have comparable prediction accuracies on the grain yield prediction than the full set of them but possibly resulting in a better allocation of efforts and resources on phenotypic data collection and processing.

Original languageEnglish (US)
Article number123
JournalPlant Methods
Issue number1
StatePublished - Nov 1 2019


  • Phenotyping
  • Random forest
  • Ridge regression
  • SVM
  • Unmanned aerial vehicle
  • Yield prediction

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Plant Science


Dive into the research topics of 'Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery'. Together they form a unique fingerprint.

Cite this