Seminar Details

Seminar Details

Title: Adventures with large biomedical datasets: diseases, medical records, environment and genetics
Speaker: Andrey Rzhetsky, PhD
Speaker Link:
Affiliation: University of Chicago
Host: Frabrice Smieliauskas, PhD
Position/Department: Professor, Department of Human Genetics
Location: AMB W-229
Start: 3/7/2018 3:30:00 PM


I will attempt to cover several interrelated analysis topics, spending more time on parts that resonate with the audience. 

First, I will introduce our recent study analyzing phenotypic data harvested from over 150 million unique patients. Curiously, these non-genetic large-scale data can be used for genetic inferences. We discovered that complex diseases are associated with unique sets of rare Mendelian variants, referred to as the “Mendelian code.”  We found that the genetic loci indicated by this code were enriched for common risk alleles.  Moreover, we used probabilistic modeling to demonstrate for the first time that deleterious Mendelian variants likely contribute to complex disease risk in a non-additive fashion.   

The second topic that I hope to cover is analysis of apparent clusters of neurodevelopmental disorders. Disease clusters are defined as geographically compact areas where a particular disease, such as a cancer, shows a significantly increased rate.  It is presently unclear how common are such clusters for neurodevelopmental maladies, such as autism spectrum disorders (ASD) and intellectual disability (ID).  As in the first story, examining data for one third of the whole US population, we demonstrated that (1) ASD and ID are manifesting strong clustering across US counties; (2) counties with high ASD rates also appear to have high ID rates, and (3) the spatial variation of both phenotypes appears to be driven by environment, and, by a lesser extent, by economic incentives at the state level. 

The third topic is about using electronic medical record data to 1) estimate the heritability and familial environmental patterns of diseases, and 2) infer the genetic and environmental correlations between disease pairs from a set of complex diseases. I am particularly interested in inferring objective classifications// of diseases (based on a formal optimization criterion), separately from environmental and genetic factors.