Multivariate phenotypes may be characterized collectively by a variety of low level traits, such as in the diagnosis of a disease that relies on multiple disease indicators. Such multivariate phenotypes are often used in genetic association studies. If highly heritable components of a multivariate phenotype can be identified, it can maximize the likelihood of finding genetic associations. Existing methods for phenotype refinement perform unsupervised cluster analysis on low-level traits and hence do not link to heritability. Existing heritable component analytics either cannot utilize general pedigrees or have to estimate the entire covariance matrix of low-level traits from limited samples, which is computationally often prohibitive and leads to inaccurate estimates. Furthermore, these methods can be difficult to exclude fixed effects from other covariates, such as age, sex and race, to identify truly heritable components. We propose to search for a combination of low-level traits and directly maximize the heritability of this combined trait. A quadratic optimization problem is thus derived where the objective function is formulated by decomposing the traditional maximum likelihood method for estimating the heritability of a quantitative trait. The proposed approach can generate linearly-combined traits of high heritability even after correction for the fixed effects of covariates. The effectiveness of the proposed approach is demonstrated in simulations and by a case study of cocaine dependence. Our approach was computationally efficient and derived traits of higher heritability than those by other methods. Additional association analysis with our derived cocaine-use trait identified genetic markers that were replicated in an independent sample, further proving the utility and advantage of the proposed approach.
Complex phenotypes may be characterized collectively by a variety of traits, such as a disease phenotype determined by a diagnosis that relies on multiple disease indicators. For such composite traits to be most useful in a genetic association analysis, they should reflect the heterogeneity of the phenotype and have high heritability to maximize the likelihood of finding genetic associations. The most sophisticated methods for phenotype refinement that are currently available perform unsupervised cluster analysis of low-level traits. Without theoretical guidance, unsupervised analysis may yield composite traits of little utility in the genetic analysis of the phenotype. We propose a quadratic optimization approach that directly maximizes heritability during the derivation of a quantitative composite trait. Unlike the previous methods of finding principal components of heritability for multi-dimensional traits where only between-family and within-family effects are considered in heritability estimation, the quadratic objective function in our approach is formulated by solving the inverse problem of heritability estimation that considers a variety of effects, such as additive and dominant genetic effects and environmental effects. Our approach is suitable for analysis of general pedigrees. It can also generate composite traits that have high heritability even after correction for the fixed effects of covariates such as age, sex and race. We demonstrate the effectiveness of the proposed approach in simulations and in the analysis of real-world data from a study of cocaine dependence. A cross-validation test showed that the heritability of the derived trait was much higher than the standard disease phenotypes. A genomewide association study with the derived trait identified new variants that were not detected by the association analysis with the commonly-used symptom counting phenotype. The association findings with the trait were replicated in an independent sample. Thus, the proposed approach produces models for composite traits of high heritability with excellent generalizability.