Bozdag Lab

 

Our lab is interesting in analyzing high-throughput biological and clinical datasets to answer high-impact biological questions. We are particularly interested in developing algorithms to build predictive models of biological processes such as gene regulation, RNA surveillance, tumorigenesis, and aging.

Current Students

baurb.jpg
KS-portrait
damienready.jpg
Brittany Baur
Karl Stamm
Damien Ready
Ph.D. student (Computational Sciences)
Ph.D. student (Computational Sciences)
M.S. student (Bioinformatics)

Former Students

yemalin.jpg
Yemalin Godonou
M.S. student (Computing)

Some of our current/recent projects

1. Age-specific signatures of glioblastoma. Glioblastoma (GBM), one of the deadly types of brain cancer exhibits different survival rates among young and old patients. More specifically, old GBM patients survive significantly shorter than young GBM patients. In this project, we analyze high-throughput genomic, genetic, and epigenetic datasets of several hundreds of GBM patients to find age-specific signatures of GBM. Work in collaboration with Howard A. Fine at the NYU Cancer Institute.
Related papers

  • [DOI] S. Bozdag, A. Li, G. Riddick, Y. Kotliarov, M. Baysan, F. M. Iwamoto, M. C. Cam, S. Kotliarova, and H. A. Fine, “Age-specific signatures of glioblastoma at the genomic, genetic, and epigenetic levels.,” PLoS One, vol. 8, iss. 4, p. e62982, 2013.
    [Bibtex]
    @article{Bozdag:PlosOne:2013,
      author = {Bozdag, Serdar  and Li, Aiguo  and Riddick, Gregory  and Kotliarov, Yuri  and Baysan, Mehmet  and Iwamoto, Fabio M.  and Cam, Margaret C.  and Kotliarova, Svetlana  and Fine, Howard A.},
      title = {Age-specific signatures of glioblastoma at the genomic, genetic, and epigenetic levels.},
      journal = {PLoS One},
      uuid = {8E5FF406-0B9B-47C1-9483-2937716FB6BB},
      volume = {8},
      number = {4},
      pages = {e62982},
      organization = {Neuro-Oncology Branch, National Cancer Institute, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA. serdar.bozdag\@marquette.edu},
      address = {United States},
      year = {2013},
      doi = {10.1371/journal.pone.0062982},
      ISSN = {1932-6203},
      US_NLM_ID = {101285081},
      PII = {PONE-D-12-16519},
      pubmedid = {23658659},
      PMCID = {PMC3639162},
      url = {},
      keywords = {research support, n.i.h., intramural},
      Web_data_source = {PubMed},
      abstract = {Age is a powerful predictor of survival in glioblastoma multiforme (GBM) yet the biological basis for the difference in clinical outcome is mostly unknown. Discovering genes and pathways that would explain age-specific survival difference could generate opportunities for novel therapeutics for GBM. Here we have integrated gene expression, exon expression, microRNA expression, copy number alteration, SNP, whole exome sequence, and DNA methylation data sets of a cohort of GBM patients in The Cancer Genome Atlas (TCGA) project to discover age-specific signatures at the transcriptional, genetic, and epigenetic levels and validated our findings on the REMBRANDT data set. We found major age-specific signatures at all levels including age-specific hypermethylation in polycomb group protein target genes and the upregulation of angiogenesis-related genes in older GBMs. These age-specific differences in GBM, which are independent of molecular subtypes, may in part explain the preferential effects of anti-angiogenic agents in older GBM and pave the way to a better understanding of the unique biology and clinical behavior of older versus younger GBMs}
    }

2. Modeling of gene regulation from high-throughput biological datasets. The aim of this study is to integrate multiple sources of biological datasets in the dynamic Bayesian framework to infer gene regulatory networks accurately. We integrate mRNA expression, DNA methylation, copy number and several other biological datasets in our framework to increase the accuracy of the results. The integration of various datasets will help us recall existing interactions and avoid spurious ones.

Related papers

  • [DOI] S. Bozdag, A. Li, S. Wuchty, and H. A. Fine, “FastMEDUSA: a parallelized tool to infer gene regulatory networks.,” Bioinformatics, vol. 26, iss. 14, pp. 1792-3, 2010.
    [Bibtex]
    @article{Bozdag:Bioinformatics:2010,
      author = {Bozdag, Serdar  and Li, Aiguo  and Wuchty, Stefan  and Fine, Howard A.},
      title = {FastMEDUSA: a parallelized tool to infer gene regulatory networks.},
      journal = {Bioinformatics},
      uuid = {DF9B00E5-15DD-4286-AC7A-70E0D51AF073},
      volume = {26},
      number = {14},
      pages = {1792-3},
      organization = {Neuro-Oncology Branch, National Cancer Institute, National Institute of Neurological Diseases and Stroke, Bethesda, MD 20892, USA.},
      address = {England},
      month = {7},
      year = {2010},
      doi = {10.1093/bioinformatics/btq275},
      ISSN = {1367-4811},
      US_NLM_ID = {9808944},
      PII = {btq275},
      pubmedid = {20513661},
      PMCID = {PMC2894517},
      url = {},
      keywords = {Sequence Analysis, DNA;Gene Expression Profiling;research support, n.i.h., intramural;Genomics;Software;research support, n.i.h., extramural;Gene Regulatory Networks},
      Web_data_source = {PubMed},
      abstract = {MOTIVATION: In order to construct gene regulatory networks of higher organisms from gene expression and promoter sequence data efficiently, we developed FastMEDUSA. In this parallelized version of the regulatory network-modeling tool MEDUSA, expression and sequence data are shared among a user-defined number of processors on a single multi-core machine or cluster. Our results show that FastMEDUSA allows a more efficient utilization of computational resources. While the determination of a regulatory network of brain tumor in Homo sapiens takes 12 days with MEDUSA, FastMEDUSA obtained the same results in 6 h by utilizing 100 processors.
    AVAILABILITY: Source code and documentation of FastMEDUSA are available at https://wiki.nci.nih.gov/display/NOBbioinf/FastMEDUS}
    }

3. Computing gene-centric DNA methylation from probe-level methylation arrays
DNA methylation is a biochemical process that adds a methyl group to cytosine nucleotides of DNA. It plays a key role in transcriptional silencing, and therefore is negatively correlated with gene expression. Recent advances in microarray technology have allowed measuring genome-wide methylation levels in DNA of higher-level organisms. In these microarrays, methylation levels of about 450,000 probes are measured, resulting in volumes of data. Each gene is associated with a number of probes. Thus, to compute the overall methylation level of a gene, one needs to analyze methylation levels of its probes.
In this study, we will develop a method to compute gene-centric methylation levels based on probe-level methylation data. We will also investigate regions where methylation may be more tightly correlated with gene expression. We will apply our method on different datasets to find out if the results are specific to each dataset or more universal.

4. Genomic analysis of Caenorhabditis elegans during stress. In this project, we apply FastMEDUSA on gene expression datasets of C. elegans that are exposed to different levels of arsenic. We aim to identify significant regulators in the stress response against arsenic in C. elegans. We also analyzed the master regulators of immune systems in C. elegans after infected vith Vibrio cholerae. Work in collaboration with Dr. H. Nese Cinar at Food and Drug Administration.
Related papers

  • [DOI] S. N. Sahu, J. Lewis, I. Patel, S. Bozdag, J. H. Lee, J. E. LeClerc, and H. N. Cinar, “Genomic analysis of immune response against Vibrio cholerae hemolysin in Caenorhabditis elegans.,” PLoS One, vol. 7, iss. 5, p. e38200, 2012.
    [Bibtex]
    @article{Sahu:PlosOne:2012,
      author = {Sahu, Surasri N.  and Lewis, Jada  and Patel, Isha  and Bozdag, Serdar  and Lee, Jeong H.  and LeClerc, Joseph E.  and Cinar, Hediye Nese},
      title = {Genomic analysis of immune response against Vibrio cholerae hemolysin in Caenorhabditis elegans.},
      journal = {PLoS One},
      uuid = {A9703483-9A17-4EBA-950E-4AF514AA2AFC},
      volume = {7},
      number = {5},
      pages = {e38200},
      organization = {Division of Virulence Assessment, Food and Drug Administration, Laurel, Maryland, United States of America.},
      address = {United States},
      year = {2012},
      doi = {10.1371/journal.pone.0038200},
      ISSN = {1932-6203},
      US_NLM_ID = {101285081},
      PII = {PONE-D-11-24146},
      pubmedid = {22675448},
      PMCID = {PMC3364981},
      url = {},
      keywords = {Animals;Bacterial Proteins;Caenorhabditis elegans;Immunity, Innate;Virulence Factors;Bacterial Toxins;Unfolded Protein Response;Genomics;RNA Interference;Gene Expression Profiling;Hemolysin Proteins;Caenorhabditis elegans Proteins;Bacillus thuringiensis;Vibrio cholerae;Temperature;Gene Expression Regulation;Amino Acid Motifs;Inflammation;Transcription, Genetic},
      Web_data_source = {PubMed},
      abstract = {Vibrio cholerae cytolysin (VCC) is among the accessory V. cholerae virulence factors that may contribute to disease pathogenesis in humans. VCC, encoded by hlyA gene, belongs to the most common class of bacterial toxins, known as pore-forming toxins (PFTs). V. cholerae infects and kills Caenorhabditis elegans via cholerae toxin independent manner. VCC is required for the lethality, growth retardation and intestinal cell vacuolation during the infection. However, little is known about the host gene expression responses against VCC. To address this question we performed a microarray study in C. elegans exposed to V. cholerae strains with intact and deleted hlyA genes.Many of the VCC regulated genes identified, including C-type lectins, Prion-like (glutamine [Q]/asparagine [N]-rich)-domain containing genes, genes regulated by insulin/IGF-1-mediated signaling (IIS) pathway, were previously reported as mediators of innate immune response against other bacteria in C. elegans. Protective function of the subset of the genes up-regulated by VCC was confirmed using RNAi. By means of a machine learning algorithm called FastMEDUSA, we identified several putative VCC induced immune regulatory transcriptional factors and transcription factor binding motifs. Our results suggest that VCC is a major virulence factor, which induces a wide variety of immune response- related genes during V. cholerae infection in C. elegans}
    }
  • [DOI] S. N. Sahu, J. Lewis, I. Patel, S. Bozdag, J. H. Lee, R. Sprando, and H. N. Cinar, “Genomic Analysis of Stress Response against Arsenic in Caenorhabditis elegans.,” PLoS One, vol. 8, iss. 7, p. e66431, 2013.
    [Bibtex]
    @article{Sahu:PlosOne:2013,
      author = {Sahu, Surasri N.  and Lewis, Jada  and Patel, Isha  and Bozdag, Serdar  and Lee, Jeong H.  and Sprando, Robert  and Cinar, Hediye Nese},
      title = {Genomic Analysis of Stress Response against Arsenic in Caenorhabditis elegans.},
      journal = {PLoS One},
      uuid = {97470070-4C38-4574-9BBE-C36C7B9F534E},
      volume = {8},
      number = {7},
      pages = {e66431},
      organization = {Division of Virulence Assessment, Food and Drug Administration, Laurel, Maryland, United States of America ; Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee, United States of America.},
      address = {United States},
      year = {2013},
      doi = {10.1371/journal.pone.0066431},
      ISSN = {1932-6203},
      US_NLM_ID = {101285081},
      PII = {PONE-D-12-32432},
      pubmedid = {23894281},
      url = {},
      Web_data_source = {PubMed},
      abstract = {Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03\%) exposure caused stronger global gene expression changes in comparison with low dose (0.003\%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA}
    }

5. Genome-wide analysis of regulation and surveillance of non-coding RNAs This research, carried out in collaboration with investigators at Marquette University and the National Institutes of Health, concerns the role of antisense RNA and mRNA-like non-coding RNA in the regulation of mammalian gene expression. This work is directed at identifying and characterizing non-coding RNAs with a focus on their role in mammalian gene regulation. Recent studies have described many thousands of pairs of overlapping genes in the mammalian genomes. A leading hypothesis is that many such overlaps act to regulate expression of proximal or overlapping mRNAs. The complexity of antisense and non-coding transcription poses many challenges that are fundamental for understanding the regulation of genes and their involvement in human genetic diseases.

Collaborators

  • Jim Anderson, Marquette University
  • Naveen Bansal, Marquette University
  • Bekir Cinar, Univeristy of California, Los Angeles
  • H. Nese Cinar, Food and Drug Administration
  • Mehdi Maadooliat, Marquette University
  • Steve Munroe, Marquette University
  • Lisa Petrella, Marquette University
  • Stefan Wuchty, University of Miami