Since 3’ UTRs were found to contain docking sites for miRNAs, labs with computational muscle around the world have been chipping away the target predication dilemma, but like any game, it’s hard to win when you don’t know the rules. It’s even harder when the rules are being rewritten weekly which is why we have an assortment of prediction algorithms today.
Most of the current approaches take into account miRNA-mRNA sequence complementarity, miRNA-mRNA duplex thermodynamics with various algorithms weighting the relative importance of these differently. Conservation across species is often used as well, but not always. Some take into account, secondary structure changes after initial binding of seed regions, but depending on how each of the algorithms weights the various factors can have a major impact on the predictions.??Let’s take a quick look into some of the popular approaches in this arena that can be broken into those that look at sequence before structure, those that look at structure first, then finish off with some recent approaches that take a different road. We’ll be brief though, because too much algorithm talk can be pretty mind numbing. Check out the recommended publications and papers for the grit.
Sequence Before Structure
Many of the initial algorithms share a common approach of evaluating miRNA-target complementarity first, then move to evaluating the binding site thermodynamics to further prioritize. The putative target pool is then filtered, often by requiring species conservation. Some of the more widely used include:
miRanda
Predicting targets that have the right to be silenced is tricky no doubt. miRanda was introduced back in 2003 by researchers Rockefeller, Memorial Sloan-Kettering, and the Columbia Genome Center. A three step algorithm, miRanda performed the following tasks:
- Searched for complementarity between miRNAs and 3′ UTRs with to complementarity near the 5’ miRNA seed region
- Calculated the thermodynamics of the binding sites
- Filtered results using species conservation
When using these parameters miRanda was able to identify many known targets in Drosphilia and showed a false positive rate in the range of 24-39%, however, when researchers used the multiple sites miRNAs often exhibit in their mRNA targets, this rate improved. Additionally, updates to the original miRanda use a stricter seed pairing rules that improve the output and more recent updates, including the integration of a statistical model, further assist in improving specificity, making miRanda one of the more widely used programs.
On TargetScan(S)
TargetScan deviates from miRanda in that it addresses some of the filtering at an earlier stage by requiring perfect complementarity to the seed region of the miRNA and by selecting for species conservation. TargetScan then follows a similar path as miRanda, evaluating the predicted targets by their thermodynamic stability using programs from the Vienna RNA Package.
The first algorithm to be applied to human target prediction, TargetScan showed a slightly improved false positive rate (22-31%) and could predict novel targets relatively well. Updates to the algorithm surfaced in TargetScanS, a simplified version of TargetScan and today the miRNA-target complementarity is limited to six nucleotides of the seed region (bases 2-7). Although TargetScanS seed complementarity requirement reduces false positives significantly, it does so at the risk of loosing some of the “looser” miRNA-targets like the 3’ compensatory sites which often exhibit mismatches in the seed region, but stronger pairing in the 3’ region of the miRNA.
PicTar
Though it sounds more like a graphic program or Hollywood studio, PicTar is an interesting algorithm that enables the prediction of miRNA targets by first aligning input orthologous 3’ UTRs and a search set of co-expressed miRNAs, mapping target sites, then filtering them by their predicted free energy. The initial version of PicTar demonstrated similar prediction efficacy as miRanda and TargetScan(S) but an interesting added value PicTar brings to the table is the ability to identify targets that may be regulated by multiple miRNAs. The validation of PicTar actually proved very useful as it illustrated the coordinate regulation of Mtpn gene by three microRNAs.
Structure Before Sequence
Unlike the algorithms discussed above, thermodynamics based algorithms like DIANA-microT and RNAHybrid place more emphasis on target structure than seed complementarity.
DIANA
The DIANA-microT algorithm first investigates target thermodynamics by using a sliding 38 nt frame across 3’ UTRs to evaluate the minimum binding energy between miRNAs and the sequences within the 3’ UTRs. Then the DIANA-microT algorithm employs similar seed matching requirements as previously described algorithms but, in a departure from some of these algorithms, requires a level of binding in the 3’ region of the miRNA.
RNAHybrid
With gas prices soaring, hybrids are more popular than ever but we never thought we’d see them penetrate the miRNA prediction field so quickly! RNAHybrid was one of the first programs to bring a strong statistical package into the mix. The program finds regions in the 3’ UTR that have potential to form duplexes with miRNAs and produces information on the putative target site quality, the quantity of sites available, their conservation and the significance of the conservation.
Arrival of the Departures
Although the algorithms all work reasonably well, one aspect of targets that is not typically taken into account is the sequence context surrounding the target and its impact on secondary structure of the putative target site. Increasing evidence suggests that the secondary structure of the mRNA may prove as important or more as the miRNA-target sequence homology and free energy since the folding of the targets may impact the accessibility of the target site.
STarMir
Despite being based in Southern California, EpiGenie is rarely star struck, but we have to say this approach was impressive and makes a lot of sense to us. Long and colleagues used Sfold to predict probable secondary structures of mRNA targets, in a structured-based 2-step approach to target prediction.
- Step One: miRNA binds 4 continuous bases in the mRNA target
- Step Two: Remainder of miRNA pairs with mRNA target and secondary structure is disrupted
This approach is an interesting departure from the strict 5’ miRNA seed pairing requirements of some other programs and does not rule out paring events in the 3’ region of the miRNA. One of the most interesting aspects of STarMir’s performance was that it was able to predict varying sensitivities reporter systems exhibited to let-7 repression from sequence variation outside of the miRNA binding site. To illustrate this, Long’s group carefully chose sequence modifications that altered the secondary structure of the target and thus its accessibility outside the miRNA’s binding site.
RNA22
When we think of IBM, we think of ThinkPads (now distributed by Lenovo but you get the point), servers and the like, but tucked away in the Thomas Watson Research Center in New York, the Bioinformatics and Pattern Discovery Group has been applying their pattern recognition mojo to miRNA discovery and target prediction for the last few years and coming up with some really interesting data.??RNA22 first identifies binding sites base on multiple, statistically significant patterns by using conserved sequence features of known miRNAs. Then it predicts miRNAs that are likely to target the binding sites. This eliminates the requirement of knowing the identity of a given miRNA.
One of the most unique aspects of RNA22’s approach for miRNA target prediction is that it eliminates the use of cross-species conservation filtering of targets. This has a huge impact immediately on the number of targets as does the fact that the algorithm starts with miRNA sequences rather than input 3’ UTRs. This also leads to putative targets sites in 5’ UTRs and coding regions of genes, areas that have seen minimal focus to date in experiments. ??According to RNA22, 30-50% of 5’ UTRs contain at least one target and every amino acid coding sequence contains at least a single target site as well.
“We think that there’s a substantial organism-specific component, and there are regulatory motifs that are present within a genome that you are not going to find in neighboring genomes,” stated Isidore Rigoutsos, manager of the Bioinformatics and Pattern Discovery Group at IBM in a recent interview. To drive home the point that single miRNAs can target thousands of UTRs, Rigoutsos team backed up their predictions by rolling up their sleeves and generating over 200 luciferase reporter assays and tested 3 miRNAs against them. The result? Over half the tested predictions showed repression between 40% and 80%.
miRNA Target Prediction Algorithm References
- miRanda: Enright, A.J. et al. (2003) MicroRNA targets in Drosophila. Genome Biol. 5, R1
- TargetScan: Lewis, B.P. et al. (2003) Prediction of mammalian microRNA targets. Cell 11,?787–798
- TargetScanS: Lewis, B.P. et al. (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20
- DIANA microT: Kiriakidou, M. et al. (2004) A combined computational–experimental approach predicts human microRNA targets. Genes Dev. 18, 1165–1178?PicTar: Krek, A. et al. (2005) Combinatorial microRNA target predictions. Nat. Genet. 3,?495–500
- RNAHybrid: Rehmsmeier, M. et al. (2004) Fast and effective prediction of microRNA/target duplexes. RNA 10, 1507–1517
- STarMir: Long, D. et al (2007) Potent effect of target structure on microRNA function. Nat. Struct. and Mol Bio. 14, 287-294.
- RNA22: Huynh, T. et al. (2006) A pattern-based method for the identification of microRNA-target sites and their corresponding RNA/RNA complexes. Cell 126, 1203-1217