It turns out that those graphics cards in our computers aren’t just great for gaming, they can also push your epigenomic research to the next level. When industry meets academia, we can all win and thanks to an exciting collaboration between the graphics processing unit (GPU) inventors at NVIDIA and the lab of ATAC-seq inventor Jason D. Buenrostro (Harvard), we have a new tool to level up ATAC-seq: AtacWorks.
Although we’ve seen a rise in studies using ATAC-seq to assess genome-wide chromatin accessibility, it’s not without its limitations, so news of a new tool that brings down the cost, time, and input requirements is a cause for celebration, particularly when it comes to studies of single-cells and rare cell populations. To demonstrate the raw power of GPUs when it comes to machine learning, the tandem team developed a deep learning framework to map noisy, low quality, cell count or coverage ATAC-seq data, specifically Atacworks:
- Uses a signal track input to
- De-noise at a single base-pair resolution to predict signal track
- Perform peak calling to predict the genomic location of regulatory elements
- Learns chromatin accessibility features rather than cell-type patterns, as indicated by its identification of cell-type-specific peaks that aren’t in the training data
- Outperforms smoothing by linear regression and has superior peak calling to MACS2
- It is robust for mixed cell type data but performs best when also trained on mixed cell type data
- Enhances peak calling for single-cell ATAC-seq data (typically low cell count)
To show AtacWorks potential, the model was trained on bulk ATAC-seq data from FACS-isolated human blood-derived cells, then performance tested on erythroblast ATAC-seq data. It was able to identify differentially accessible regulatory regions associated with lineage-primed hematopoietic stem cells. Since AtacWorks doesn’t consider underlying DNA sequence, it is generalizable across cell types or even species. Even more excitingly, AtacWorks can be used to predict other modalities from ATAC-seq data. In their example, the synergistic scientists use AtacWorks to identify transcription factor footprints, which typically need a high read depth. Using low-input ATAC-seq, the researchers predicted ChIP-seq peaks for CTCF and H3K27ac (an active histone posttranslational modification), with a high level of concordance between prediction and actual data.
Senior author Jason Buenrostro shares, “With AtacWorks, we’re able to conduct single-cell experiments that would typically require 10 times as many cells. Denoising low-quality sequencing coverage with GPU-accelerated deep learning has the potential to significantly advance our ability to study epigenetic changes associated with rare cell development and diseases.” First author Avantika Lal concludes, “With very rare cell types, it’s not possible to study differences in their DNA using existing methods. AtacWorks can help not only drive down the cost of gathering chromatin accessibility data, but also open up new possibilities in drug discovery and diagnostics.”
Get your ATAC-seq game on with AtacWorks in Nature Communications, March 2021.