Don’t you just hate when you’re trying to get from A to B and google maps can’t even figure out where you’re starting from? Well, just like your last trip to that hot new sushi place, the precise location of lncRNA origins has only been plotted with blurry lines. Thankfully, to aid the start of your next lncRNA voyage, an international team led by Alistair Forrest and Piero Carninci from the RIKEN FANTOM5 consortium (Japan) bring forth the most accurate atlas of their 5’ ends yet and help show their functional potential.
Framing their study, Forrest shares, “There is strong debate in the scientific community on whether the thousands of long non-coding RNAs generated from our genomes are functional or simply byproducts of a noisy transcriptional machinery.” To answer this fundamental question, the team turned to a stockpile of data generated using their cap analysis of gene expression (CAGE) technique. CAGE involves sequencing short sequence tags that originate from the 5’ cap of RNA and offers a snapshot of precisely where transcription begins, which is much more accurate than standard RNA-seq.
Here’s what went down when they examined a total of 1,829 samples from major human primary cell types and tissues:
- By integrating their CAGE with RNA-seq and the epigenetic modifications that occur precisely at the newly mapped transcription initiation regions, they forged an atlas of 27,919 human lncRNA genes, which they then examined for functionality.
- First author Chung-Chau Hon shares that rather than arising from promoters, “Intriguingly, the majority of (intergenic) long non-coding RNAs appear to be generated from enhancer elements”.
- By using evolutionary conservation as a proxy for lncRNA function, the team found that 13,896 have conserved exons and 13,228 have conserved transcription initiation regions.
- By integrating with genomic data, they found that:
- 1,970 lncRNAs that overlap SNPs associated with traits and disease are specifically expressed in the relevant cell types.
- 3,166 lncRNAs belonging to 5,264 lncRNA-mRNA pairs overlap the SNPs of expression quantitative trait loci (eQTL) and are co-expressed alongside the mRNA, suggesting a functional role in transcriptional regulation.
Forrest summarizes, “By integrating the improved gene models with data from gene expression, evolutionary conservation and genetic studies, we find compelling evidence that the majority of these long non-coding RNAs appear to be functional, and for nearly 2,000 of them we reveal their potential involvement in diseases and other genetic traits.”
Carninci concludes, “The improved gene models and the broad functional hints of human long non-coding RNAs derived from this atlas could serve as a Rosetta Stone for us to experimentally investigate their functional relevance as part of our ongoing work for the upcoming edition of the FANTOM consortium. We anticipate that these results could further push the boundary of our understanding of the functions of the non-coding portion of our genome.”
Overall, the study provides functional evidence for a whopping 69% (19,175 of 27,919) of the lncRNAs they identified in the human genome, while also sparking our curiosity about the remaining 31%.
Go check out the online atlas and read the legend over at Nature, March 2017