It seems pattern recognition and supervised learning techniques are being applied to epigenetics research a lot nowadays. These advanced computational techniques that have proven useful in everything from homeland security to financial modeling are lending a hand to some of the challenges in epigenetics research.
Recently we’ve seen them applied to miRNA discovery and target prediction and now they’re helping researchers studying DNA methylation focus their efforts in the oases of methylation hotspots in the epigenome.
De novo Methylation Prediction with PatMAn
Although there is increasing evidence of CpG methylation and gene silencing in cancer, exactly why certain CpG islands are more or less prone to de novo methylation remains poorly understood. The Vertino Lab at Emory University has been working on predicting aberrant CpG island methylation for years now. In 2003, the Vertino Lab, along with collaborators, identified seven novel sequence patterns (Figure 1) that proved capable of predicting a CpG island’s predisposition to aberrant methylation (Feltus et al, PNAS, 2003).
This classifier, now referred to as PatMAn (Pattern-based Methylation Analysis) has been given a boost by the addition of another useful feature based on the binding of SUZ12, a component of the polycomb repressive complex (PRC2). The new classifier? SUPER-PatMAn (SUZ12 Protein Enriched Regions-PatMAn) of course!
SUPER-PatMAn
SUPER-PatMAn takes into account both cis sequence elements as well as trans-acting factors, improving the accuracy of the approach considerably. We caught up with Mike McCabe from the Vertino Lab to hear more about hear a little more about SUPER-PatMAn and what’s up next….
Michael McCabe Interview
EpiGenie: What was the driving evidence for incorporating SUZ12 binding into the approach???
McCabe: There were multiple reasons for incorporating SUZ12 binding into our prediction algorithm. Drs. Vertino and Feltus collaborated with Drs. Christoph Plass and Joe Costello to perform RLGS in a model where de novo DNA methylation is driven by over-expression of the DNMT1 DNA methyltransferase. During analysis of the data, it was noted that there was an enrichment of homeobox genes among those CpG islands that were methylation-prone. Since it has long been known that polycomb plays a role in the developmental transcriptional repression of HOX genes and that HOX genes are frequently methylated in human cancers, this was our first hint that polycomb might be involved. ??
Second, a 2006 study by Vire et al (Nature) demonstrated a physical interaction between EZH2, a component of Polycomb Repressive Complex 2 (PRC2), and the DNA methyltransferases (DNMT1, DNMT3a, DNMT3b) further suggesting a direct link between polycomb targets and DNA methylation. Lastly, Lee et al (Cell, 2006) reported genome-wide mapping of PRC2 binding through ChIP-chip for SUZ12, another component of PRC2. When I mapped this dataset back to our database of methylation-prone and methylation-resistant CpG islands from the RLGS study, we noted that nearly 60% of methylation-prone CpG islands were marked by PRC2 components or the H3K27me3 modification, while less than 20% of methylation-resistant CpG islands were similarly marked. ??Taken together these data suggested that PRC2-binding may correlate with methylation-prone CpG islands. Since we’ve started working on this, several groups have now reported an association between genes targeted by PRC2 and those hypermethylated in human cancers.
EpiGenie: Previously your lab used RLGS for methylation profiling in the model system. Are you still using this approach or have you moved towards a more global profiling approach?
McCabe: We are progressing towards additional genome-wide methylation profiling techniques. In collaboration with Dr. Paul Wade at the NIEHS, we have assessed the feasibility of several approaches including MeDIP-ChIP and MeDIP-Seq. However, we are currently moving toward the?Illumina GoldenGate and Infinium platforms. The GoldenGate system is a?96-well format capable of screening approximately 500 CpG islands while?the Infinium system covers roughly 10,000 CpG islands in a 12 sample?format. We’ve chosen to focus on the Illumina platforms for now?primarily due to financial considerations. Since the vast majority of?CpG dinucleotides are located within repetitive elements and are?normally methylated, MeDIP-Seq requires deeper sequencing than your?average transcription factor ChIP-Seq which results in a higher?per-sample price tag. Since we are primarily interested in the?methylation status of CpG islands, the Illumina Infinium platform?allows us to focus on just over a quarter of all human CpG islands?meeting Takai and Jones criteria at a much more reasonable per-sample?price.
EpiGenie: How has the predictive accuracy been affected when expanding the interrogated regions?
McCabe: Unfortunately, we have not yet gotten these experiments to the point of testing them against our predictions. This will, however, be an important and interesting experiment.
EpiGenie: Are there plans to incorporate any additional trans-acting elements to further improve the predictive value?
McCabe: Absolutely. Thanks in large part to the explosion of genome-wide ChIP studies, we now have the capacity to incorporate many additional cis- and trans-acting elements into our next generation methylation classifiers. In addition to the thousands of DNA sequence patterns considered during the classifier training, these classifiers will also be provided with >1,000 additional cis- and trans-acting features that may be associated with DNA methylation. One of the benefits of the supervised learning techniques created by our collaborator, Dr. Eva Lee (Georgia Institute of Technology), is that you can provide the program with nearly unlimited features and only those features that are most efficient at discriminating methylation-prone and methylation-resistant CpG islands will be utilized to generate the final classifier. Through this unbiased approach we hope to identify novel cis-acting DNA patterns and trans-acting factors that may play a role in determining sites of aberrant DNA methylation in human cancers.