TY - JOUR
T1 - Evaluation of linkage disequilibrium in wheat with an L1-regularized sparse Markov network
AU - Morota, Gota
AU - Gianola, Daniel
N1 - Funding Information:
The authors thank the anonymous reviewers for their valuable comments. This work was supported by the Wisconsin Agriculture Experiment Station and by a Hatch grant from the United States Department of Agriculture.
PY - 2013/8
Y1 - 2013/8
N2 - Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.
AB - Linkage disequilibrium (LD) is defined as a stochastic dependence between alleles at two or more loci. Although understanding LD is important in the study of the genetics of many species, little attention has been paid on how a covariance structure between many loci distributed across the genome should be represented. Given that biological systems at the cellular level often involve gene networks, it is appealing to evaluate LD from a network perspective, i.e., as a set of associated loci involved in a complex system. We applied a Markov network (MN) to study LD using data on 1,279 markers derived from 599 wheat inbred lines. The MN attempts to account for association between two markers, conditionally on the remaining markers in the network model. In this study, the recovery of the structure of a LD network was done through two variants of pseudo-likelihoods subject to an L1 penalty on the MN parameters. It is shown that, while the L1-regularized Markov network preserves features of a Bayesian network (BN), the nodes in the resulting networks have fewer links. The resulting sparse network, encoding conditional independencies, provides a clearer picture of association than marginal LD metrics, and a sparse graph eases interpretation markedly, since it includes a smaller number of edges than a BN. Thus, an L1-regularized sparse Markov network seems appealing for representing conditional LD with high-dimensional genomic data, where variables, e.g., single nucleotide polymorphism markers, are expected to be sparsely connected.
UR - http://www.scopus.com/inward/record.url?scp=84880821392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880821392&partnerID=8YFLogxK
U2 - 10.1007/s00122-013-2112-y
DO - 10.1007/s00122-013-2112-y
M3 - Article
C2 - 23661079
AN - SCOPUS:84880821392
SN - 0040-5752
VL - 126
SP - 1991
EP - 2002
JO - Theoretical and Applied Genetics
JF - Theoretical and Applied Genetics
IS - 8
ER -