Our understanding of non-coding RNAs has ramped up like a quarter pipe at the X Games over the last several years and, if recent publications are any indication, we’ve only scratched the surface. The realization that just a sliver of our vast transcriptomes is protein-coding, much of it is churned out in both directions, and that these transcripts often overlap, is enough to get even the most jaded researcher excited. Add in that many of them appear functional and you get a pretty fired up research community.
But when it’s time to get up close and personal with the transcriptome in your favorite tissue, you have a lot more things to consider…creating both challenges and opportunities. The good news is that there’s tons of low hanging data and potential for breakthrough discoveries. Better yet…you don’t have to learn that many new tricks to get going. The downside is that you have a whole transcriptome to sift through before you can figure out what is really meaningful. If you’re not a hard-core data cruncher, all of the mapping and annotation of the data can seem like the world’s toughest Sudoku puzzle.
Luckily, there is some help out there. Before the miRNA movement brought non-coding transcripts mainstream and back when RNA was still the “other nucleic acid,” a few research groups like the Mattick Lab at the University of Queensland in Australia, were busy interrogating non-coding RNAs. They’ve tackled many of the issues researchers face when getting into non-coding RNA analysis and have developed a couple of nice tools along the way. We caught up with Marcel Dinger, from the Mattick Lab, who filled us in on how they’ve been taking a closer look at these transcripts lately using microarrays (NCode™ Noncoding RNA Arrays to be exact) they helped develop.
Non-Coding RNA Profiling Demystified
The arrays don’t have a gazillion features or anything, but they do have some carefully picked sequences from people who understand the landscape. “The array content was entirely designed from publicly available sequences, including RefSeq genes, UCSC genes, Mammalian Gene Collection cDNAs, FANTOM3 cDNAs, other full-length cDNAs from Genbank, and ESTs that were encompassed with any homeotic loci.” Marcel explains. The key feature of the array is in the classification of the target transcripts. “The NCode™ Array gives the user a meaningful classification of transcript targets that is based on some fairly robust principles.” (Genome Research, June 2008 and PNAS, Jan 2008). The arrays, which are printed by Agilent Technologies, don’t require any new labeling/detection technologies and:
- They contain over 17,000 long (>200 bases) non-coding RNAs in human and over 10,000 in mouse.
- They include over 20K coding RNAs, allowing simultaneous profiling of coding and non-coding RNAs.
ncRNA Array Data Analysis? Yeah, There’s an App for That
Generating the data isn’t much different than mRNA expression profiling, but as many early movers will tell you, the analysis can be a little tricky at first glance. “The tool we use for analysis of the NCode™ arrays (and soon RNAseq data as well) is NRED. Presently, the public version of NRED allows the user to only examine our published data as well as the latest annotations of the NCode™ array. By next year we hope to provide an interface so that users can simply upload their expression data and undertake a largely automated analysis of the data,” says Dinger. (Nucleic Acids Research, October 2008)
NRED allows users to:
- Filter their data by, expression intensity, fold-change, significance, genomic context and coding classification
- Identify associated protein-coding genes or other transcripts
Link out to the UCSC Genome Browser, so that relative probe positions, etc. can be easily visualized
One group that has given this setup a whirl is a team from the Hannover Medical School in Germany, led by Iyas Hamwi. So far, they are glad they did. They recently used the NCode™ arrays to search for non-coding RNAs to use as outcome prediction markers in diseased patients. Although data analysis is ongoing, their early results are promising. “Without the array we would never be able to find differentially expressed ncRNA.” Hamwi claims. Dr. Hamwi’s experiments weren’t just a walk in the park though, “The hardest thing about array experiments is always the analysis of the results. One problem we had to face is the annotation of the ncRNAs.” Iyas commented. Even so, he hoped “…that more groups can work with them and share their experience and results to move medicine ahead much faster.”
ncRNA Annotation Hurdles
Dr. Hamwi’s case illustrates the current disconnect between the available tools and our knowledge of non-coding RNAs. Marcel Dinger explained some of the issues. “Overlapping transcripts are probably the single greatest challenge. Transcriptomic sequencing has made it clear that the genome does not comprise of conveniently discrete blocks, but rather comprises an interlaced overlapping network of sequences.”
“The next major problem is repeats – although we know repeats can be functional, it is very difficult to uniquely target transcripts that contain a lot of repetitive sequences. In terms of analysis, the approaches are quite different to how you would look at regular protein gene expression arrays. With non-coding RNAs, it’s all about their genomic context. For many of the long non-coding RNAs that have been functionally characterized, their proximity in the genome relative to other genes was important to their function.”
Much of the genomic annotation out there was done years ago, without regard to non-coding RNAs, something that Marcel Dinger and others are working hard to change. But for now, when undertaking a set of experiments like this, you’ll want to buckle up your chin strap and be ready to really dig into the data analysis.
Why Bother?
Like any emerging research area, the challenges are offset by the promise. When asked about what he sees coming down the chute for ncRNA analysis, Marcel Dinger sees a bright future. “Naturally, we see the area absolutely exploding in coming years. It is clear now that the majority of disease-associated SNPs are located within non-coding regions of the genome. We know that many of these regions are expressed, so naturally to understand the basis of disease, we need to look closer at non-coding parts of the genome, and that means analysis of non-coding RNAs. It doesn’t take a crystal ball to see that there will be tremendous value in making sense of the massive amount of functional genetic material in the genome that doesn’t encode protein. I really believe we are at an amazing juncture in molecular biology and the coming years will bring some remarkable discoveries as well as incredible opportunities.” Hey, everybody loves an optimist!
Bone up on ncRNAs and their functionality at Briefings in Functional Genomics and Proteomics, September 2009.
Check out the latest info on NCode™ Non-Coding RNA Arrays at the Life Technologies site.