Development of feed composition tables using a statistical screening procedure

H. Tran, A. Schlageter-Tello, A. Caprez, P. S. Miller, M. B. Hall, W. P. Weiss, P. J. Kononoff

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Millions of feed composition records generated annually by testing laboratories are valuable assets that can be used to benefit the animal nutrition community. However, it is challenging to manage, handle, and process feed composition data that originate from multiple sources, lack standardized feed names, and contain outliers. Efficient methods that consolidate and screen such data are needed to develop feed composition databases with accurate means and standard deviations (SD). Considering the interest of the animal science community in data management and the importance of feed composition tables for the animal industry, the objective was to develop a set of procedures to construct accurate feed composition tables from large data sets. A published statistical procedure, designed to screen feed composition data, was employed, modified, and programmed to operate using Python and SAS. The 2.76 million data received from 4 commercial feed testing laboratories were used to develop procedures and to construct tables summarizing feed composition. Briefly, feed names and nutrients across laboratories were standardized, and erroneous and duplicated records were removed. Histogram, univariate, and principal component analyses were used to identify and remove outliers having key nutrients outside of the mean ± 3.5 SD. Clustering procedures identified subgroups of feeds within a large data set. Aside from the clustering step that was programmed in Python to automatically execute in SAS, all steps were programmed and automatically conducted using Python followed by a manual evaluation of the resulting mean Pearson correlation matrices of clusters. The input data set contained 42, 94, 162, and 270 feeds from 4 laboratories and comprised 25 to 30 nutrients. The final database included 174 feeds and 1.48 million records. The developed procedures effectively classified by-products (e.g., distillers grains and solubles as low or high fat), forages (e.g., legume or grass-legume mixture by maturity), and oilseeds versus meal (e.g., soybeans as whole raw seeds vs. soybean meal expellers or solvent extracted) into distinct sub-populations. Results from these analyses suggest that the procedure can provide a robust tool to construct and update large feed data sets. This approach can also be used by commercial laboratories, feed manufacturers, animal producers, and other professionals to process feed composition data sets and update feed libraries.

Original languageEnglish (US)
Pages (from-to)3786-3803
Number of pages18
JournalJournal of Dairy Science
Issue number4
StatePublished - Apr 2020


  • clustering
  • database
  • nutrient
  • principal component analysis

ASJC Scopus subject areas

  • Food Science
  • Animal Science and Zoology
  • Genetics


Dive into the research topics of 'Development of feed composition tables using a statistical screening procedure'. Together they form a unique fingerprint.

Cite this