Research Statement Summary

My lab’s research goal is to develop open source integrative computational tools to analyze high dimensional biological, clinical and environmental exposure datasets to infer context-specific gene regulatory interactions and modules, and to predict disease associated genes and patient-specific drug response. With the advances in high throughput technologies in biology, numerous national and international consortiums have generated a vast amount of genotype, phenotype, gene expression, and epigenetic data (collectively called multi-omics data), which have been made available to the scientific community. Furthermore, ongoing large initiatives such as UK Biobank, Million Records Project, and All of Us research program will bring vast amounts of multi-omics datasets from millions of individuals. Each of these different data modalities (e.g., mRNA expression, DNA methylation, mutation, microRNA (miRNA) expression, and copy number alteration) describes one facet of the underlying biology. Consequently, there is a tremendous need for scalable methods that can integrate different layers of multi-omics datasets across millions of individuals from different backgrounds. These methods would produce valuable insights into human diseases and pave the way for precision medicine. My research program is devoted to developing integrative computational tools utilizing artificial intelligence, machine learning and data mining methods to analyze these multi-omics datasets.