I will attempt to cover several interrelated analysis topics, spending more time on parts that resonate with the audience.
First, I will introduce our recent study analyzing phenotypic data harvested from over 150 million unique patients. Curiously, these non-genetic large-scale data can be used for genetic inferences. We discovered that complex diseases are associated with
unique sets of rare Mendelian variants, referred to as the “Mendelian code.” We found that the genetic loci indicated by this code were enriched for common risk alleles. Moreover, we used probabilistic modeling to demonstrate for the first time that deleterious
Mendelian variants likely contribute to complex disease risk in a non-additive fashion.
The second topic that I hope to cover is analysis of apparent clusters of neurodevelopmental disorders. Disease clusters are defined as geographically compact areas where a particular disease, such as a cancer, shows a significantly increased rate. It is
presently unclear how common are such clusters for neurodevelopmental maladies, such as autism spectrum disorders (ASD) and intellectual disability (ID). As in the first story, examining data for one third of the whole US population, we demonstrated that
(1) ASD and ID are manifesting strong clustering across US counties; (2) counties with high ASD rates also appear to have high ID rates, and (3) the spatial variation of both phenotypes appears to be driven by environment, and, by a lesser extent, by economic
incentives at the state level.
The third topic is about using electronic medical record data to 1) estimate the heritability and familial environmental patterns of diseases, and 2) infer the genetic and environmental correlations between disease pairs from a set of complex diseases. I
am particularly interested in inferring objective classifications// of diseases (based on a formal optimization criterion), separately from environmental and genetic factors.