Most primary miRNA transcripts remain a mystery because they’re generally degraded by the time they leave the comfort of the nucleus. Where do the transcripts start? Do they have a single “peak” of initiation, or are there a variety of sites where they’re likely to begin?
Limitations in the currently available molecular biology tool set makes isolating these transcripts pretty tough. Using novel approaches like Cap Analysis of Gene Expression (CAGE), researchers at RIKEN have been challenging the way we think about promoter/transcriptional initiation the last few years (Carninci et al. 2005; Carninci et al. 2006; Kapranov et al. 2007).
Recently, researchers from Duke and Penn used an algorithm trained on a large set of mRNA transcripts obtained from CAGE – which are sequenced from the 5’ end — to figure out which factors contribute to transcription initiation, and where the transcription factor binding sites are likely located relative to the transcription start site.
They then used this knowledge to predict with near-perfect accuracy where RNA Polymerase II is likely to latch on to genes known to have a single peak site of initiation.
When the algorithm was used on known and putative miRNA coding regions, it predicted that about 70% of these use a single major RP-II start site. For all the details, check out (Genome Research, January 2009.