As any weather forecaster can tell you, making accurate predictions is tough. But researchers now report that they’ve confirmed the power of chromatin features to predict gene expression. They’ve also come up with some new insights, too.
Sure, some histone mods are associated with transcriptional repression, whereas others typically mark active spots. But many studies linking mods with transcription were studied in only few cell lines or with little information about the RNA type.
So, researchers at the U Mass. Medical School developed a two-step model used in combination with data from the massive ENCODE project to see if these associations would hold true for 11 histone mods, one histone variant, and DNase I hypersensitivity in seven human cell lines. The data also include different types of RNA (with and without the polyA tail) from various cellular compartments measured with different methods, such as CAGE, RNA-PET, and RNA-Seq. Here’s some of what they found:
- A strong correlation existed between predicted and measured gene expression levels with CAGE, RNA-PET, and RNA-Seq, confirming many known predictive mods.
- The model was successful at predicting gene expression from chromatin features for all the cell lines.
- As expected, promoter histone mods (H3K4me2 and the like) were the most predictive for CAGE data (which is transcription start site-based), and structural marks (like H3K36me3 and H3K79me2) were the most predictive for RNA-Seq transcription-based expression data. So, transcription initiation and elongation are represented by different chromatin features.
- Genes with high-CpG promoters are more predictable than those with low CpG-containing promoters.
- Non-poly-A RNAs and RNAs that are polyadenylated seem to have different regulatory mechanisms.
We predict you’ll find all the data you need in the paper at Genome Biology, September 2012