Research Statement Summary

With the advent of high throughput technologies in biomedical research, vast amount of high-dimensional biological datasets have been generated to characterize biological systems and diseases. Particularly next generation sequencing (NGS) technology has being employed to measure genetic, epigenetic and structural changes in DNA and RNA. Petabytes of data such as DNA methylation, copy number alteration, mRNA expression and microRNA expression are publicly available. The availability of these high throughput and complex datasets require new computational methods to be developed to analyze and integrate them to answer high impact biological questions.

The research goal of my group is to develop open-source computational tools that integrate high throughput biological datasets i) to reverse engineer disease-specific gene regulatory networks and ii) to compute predictive models for biological processes and clinical outcomes.

Figure 1. The flowchart of ProcessDriver

In my group’s most recent work, we developed a tool called ProcessDriver to detect copy number-based drivers and associated biological processes in cancer (see Figure 1). to reverse engineer cancer-specific gene regulatory networks in high accuracy. We also developed a Bioconductor package to analyze, cluster and visualize RNA-seq data. We recently conducted a study where we analyzed RNA-seq data from prostate cancer cell lines to study the role of MST1 localization in prostate cancer development.