In our modern world, power is everything. Whether it be political, social, or even statistical, humankind always thirsts for more. While political and social power may be a little beyond our scope, a new bioinformatic package has come forth to power up your ability to detect differentially methylated regions (DMRs) from whole-genome bisulfite sequencing (WGBS) data.
The identification of DMRs by WGBS is no easy task; the high cost of sequencing leads to low sample sizes with low coverage and thus the requirement for complex statistical inference. Further complications arise from the correlated nature of CpG sites, which makes controlling the false discovery rate (FDR) challenging. However, as the price of sequencing continues to plummet, WGBS has emerged as a truly genome-wide method that can be applied to complex experimental designs.
To tackle the challenges of identifying DMRs from WGBS data, the lab of Rafael Irizarry at Harvard University (USA) has brought forth dmrseq. dmrseq builds on the data structure of the popular bsseq (BSmooth) package, which was also developed in the Irizarry lab, but offers a very different approach.
The identification DMR employs two critical steps:
- DMR Detection: The differences in CpG methylation for the effect of interest are pooled and smoothed to give CpG sites with higher coverage a higher weight, and candidate DMRs are assembled
- Statistical Analysis: A region statistic for each DMR, which is comparable across the genome, is estimated via the application of a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest. Then, permutation testing of a pooled null distribution enables the identification of significant DMRs
- This approach accounts for both inter-individual and inter-CpG variability across the entire genome
Notably, by performing the statistical testing on DMRs and not CpGs, dmrseq offers accurate FDR control. This approach also allows the direct adjustment of covariates in the model, an ideal situation for covariates that are continuous or contain two or more groups. Covariates can also be incorporated by balancing the permutations, which is ideal for two group covariates such as sex. Finally, dmrseq also allows for multi-group comparisons and can identify DMRs with a sample size as low as two per group.
By comparing dmrseq to bsseq, DSS, and Metilene, and examining the differences in DMR identification in data from the human epigenome roadmap, mouse models, or simulations, the talented team demonstrated the powerful capabilities of dmrseq in identifying DMRs.