Prediction and Validation of Gene-Disease Associations using Methods Inspired by Social Network Analyses

Abstract: Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called CATAPULT (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas CATAPULT is better suited to correctly identifying gene-trait associations overall.

Download: pdf

Citation

Prediction and Validation of Gene-Disease Associations using Methods Inspired by Social Network Analyses (pdf, software)
U. Singh-Blom, N. Natarajan, A. Tewari, J. Woods, I. Dhillon, E. Marcotte.
PLoS ONE 8(5), May 2013.
(e58977)

Bibtex:
@article{singh-blom2013prediction, author = "U. Martin Singh-Blom AND Nagarajan Natarajan AND Ambuj Tewari AND John O. Woods AND Inderjit S. Dhillon AND Edward M. Marcotte", title = "Prediction and Validation of Gene-Disease Associations using Methods Inspired by Social Network Analyses", journal = "PLoS ONE", volume = "8", issue = "5", year = "2013", month = "may", note = "; e58977", abstract = "Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called CATAPULT (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas CATAPULT is better suited to correctly identifying gene-trait associations overall." }

Associated Projects

Gene-Disease Prediction: A Link Prediction Approach

Center for Big Data Analytics

Prediction and Validation of Gene-Disease Associations using Methods Inspired by Social Network Analyses

U. Martin Singh-Blom, Nagarajan Natarajan, Ambuj Tewari, John Woods, Inderjit Dhillon, Edward Marcotte

Download: pdf

Citation

Associated Projects