Identifying Highly-Heritable Composite Traits or Subtypes for Complex Phenotypes

Investigators: Jinbo Bi1*, Henry R. Kranzler2, Joel Gelernter3

1Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
2Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine and Philadelphia VAMC, Philadelphia, PA, USA
3Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
* Principal Investigator

This project is supported by NIH National Institute of Drug Abuse R01DA037349.

2. An Effective Method to Identify Heritable Components from Multivariate Phenotypes
Jiangwen Sun, Jinbo Bi, and Henry R. Kranzler
under review of PLoS ONE, 2015. (A significant extended version of SIGKDD2013).


Multivariate phenotypes may be characterized collectively by a variety of low level traits, such as in the diagnosis of a disease that relies on multiple disease indicators. Such multivariate phenotypes are often used in genetic association studies. If highly heritable components of a multivariate phenotype can be identified, it can maximize the likelihood of finding genetic associations. Existing methods for phenotype refinement perform unsupervised cluster analysis on low-level traits and hence do not link to heritability. Existing heritable component analytics either cannot utilize general pedigrees or have to estimate the entire covariance matrix of low-level traits from limited samples, which is computationally often prohibitive and leads to inaccurate estimates. Furthermore, these methods can be difficult to exclude fixed effects from other covariates, such as age, sex and race, to identify truly heritable components. We propose to search for a combination of low-level traits and directly maximize the heritability of this combined trait. A quadratic optimization problem is thus derived where the objective function is formulated by decomposing the traditional maximum likelihood method for estimating the heritability of a quantitative trait. The proposed approach can generate linearly-combined traits of high heritability even after correction for the fixed effects of covariates. The effectiveness of the proposed approach is demonstrated in simulations and by a case study of cocaine dependence. Our approach was computationally efficient and derived traits of higher heritability than those by other methods. Additional association analysis with our derived cocaine-use trait identified genetic markers that were replicated in an independent sample, further proving the utility and advantage of the proposed approach.

1. Quadratic Optimization to Identify Highly Heritable Quantitative Traits from Complex Phenotypic Features
Jiangwen Sun, Jinbo Bi and Henry R. Kranzler
Proceedings of ACM Special Interest Group on Knowledge Discovery from Data Mining (SIGKDD), pp. 811-819, 2013.


Complex phenotypes may be characterized collectively by a variety of traits, such as a disease phenotype determined by a diagnosis that relies on multiple disease indicators. For such composite traits to be most useful in a genetic association analysis, they should reflect the heterogeneity of the phenotype and have high heritability to maximize the likelihood of finding genetic associations. The most sophisticated methods for phenotype refinement that are currently available perform unsupervised cluster analysis of low-level traits. Without theoretical guidance, unsupervised analysis may yield composite traits of little utility in the genetic analysis of the phenotype. We propose a quadratic optimization approach that directly maximizes heritability during the derivation of a quantitative composite trait. Unlike the previous methods of finding principal components of heritability for multi-dimensional traits where only between-family and within-family effects are considered in heritability estimation, the quadratic objective function in our approach is formulated by solving the inverse problem of heritability estimation that considers a variety of effects, such as additive and dominant genetic effects and environmental effects. Our approach is suitable for analysis of general pedigrees. It can also generate composite traits that have high heritability even after correction for the fixed effects of covariates such as age, sex and race. We demonstrate the effectiveness of the proposed approach in simulations and in the analysis of real-world data from a study of cocaine dependence. A cross-validation test showed that the heritability of the derived trait was much higher than the standard disease phenotypes. A genomewide association study with the derived trait identified new variants that were not detected by the association analysis with the commonly-used symptom counting phenotype. The association findings with the trait were replicated in an independent sample. Thus, the proposed approach produces models for composite traits of high heritability with excellent generalizability.


The above papers are associated with the following software package Click here for the Matlab package that has been implemented and described in our papers.
This is an open source program for non-commercial use only. Please contact either Dr. Jinbo Bi ( or Dr. Jiangwen Sun ( for bug reporting and on-going progress. Please cite the latest paper if the codes are used in your research.

Contact Jinbo Bi ( for information about this page.