With all the new datasets available, making sense of epigenomics data can be like finding a needle in a hayfield, let alone a stack. Studies of epigenetics in human complex disease have been greatly aided by larger and larger databases, but identifying meaningful biological relationships using these databases is challenging given their size.
Cancer studies in particular have very large datasets that can be difficult to draw meaningful conclusions from, but lucky for you, two new tools have surfaced to make use of these databases. One is a downstream overlap analysis application for epigenome-wide association studies (EWAS), while the other a suite of cancer -omics scripts. Let’s take a look at these shall we?
eFORGE: Find Tissue Components in EWAS probes
EWAS examine the association of a given trait with normal variation in DNA methylation. These studies have taken off in the last decade, since DNA methylation correlations with a disease state are thought to provide much better therapeutic target than DNA sequence. In particular, cancer studies have proven particular well suited to this line of investigation.
Since epigenetic states differ so greatly between tissues, a major point of interest for EWAS researchers is the tissue-specific regulatory components of the differentially methylated probes. This information is useful for predicting the sites of action in the EWAS signal.
Recently, University College London Cancer Institute has developed a tool to easily highlight such probes. eFORGE (experimentally-derived Functional element Overlap analysis of ReGions from EWAS) allows the user to view the tissue-specific regulatory component of a set of given EWAS differentially methylated probes (DMPs). eFORGE performs a Functional Overlap analysis to find tissue-specific signal for a set of EWAS DMPs.
Here are the key points
- Data submitted as probe lists
- DNase I hotspots from either the ENCODE or Roadmap Epigenomics used for overlap analysis
- Cell types with regulatory component enrichment are identified
- Results output as either tabular or graphic enrichment of overlap for each cell-type
In addition to providing predications for the possible mechanisms of EWAS signal generation, eFORGE output can support EWAS validity where a tissue-specific mechanism is known or expected. Further, it can also reveal unknown tissue involvements.
New Cancer Genome Atlas Scripts for DNA Methylation and miRNA Analysis
There are many more types of changes that accumulate in cancers than just DNA methylation. Changes in DNA sequence, gene expression, miRNAs, and proteins are also key components of cancer biology. The Cancer Genome Atlas (TCGA) provides a comprehensive database of this information from 33 human cancers from over 11,000 patients. Working this dataset can be challenging; the default interface provides raw, semi-processed, or processed data that each still require scripts and tools. Such tools exist for DNA sequence analysis, but have been lacking epigenomic and transcriptomic analyses.
In a recent issue of Future Medicine, Aniruddha Chatterjee and colleges report a suite of scripts called scan_tcga to retrieve and analyze TCGA data. These scripts provide patient-subgroup-specific data for any region of the genome. Here are the highlights:
- Three scripts:
- DNA methylation (tool: scan_tcga_methylation.awk),
- mRNA (tool: scan_tcga_mRNA.awk) and
- miRNA expression (tool: scan_tcga_miRNAs.awk)retrieve from TCGA level 3 data
- Selection options for disease type, data type, batch numbers, and sample preservation
- Data are output in matrix in text format for each patient in the subgroup for easy investigation and downstream analysis
In their publication, the developers also analyzed the DNA methylation signatures of frequently deregulated cancer genes and miRNAs between primary and metastatic melanomas. Their data support previous work, highlighting the validity of the scripts.
They also found several novel relationships, such as upregulation of TET1 in tumor progression, and association of DNA methylation with increased gene expression at several genes. Overall, scan_tcga provides a useful tool for parsing patient subgroups in TCGA and facilitates easy downstream analysis.
Read the full report at Future Medicine, Sept 2016.