Epigenetic Tools and Databases

Tools, where would mankind be without them? The EpiGenie team decided to search out and compile a list of the best tools and databases (that require no coding knowledge) that epigenetics researchers can’t live without.

Here we go:

The Browsers:

Go ahead, get your feet wet.

The Databases:

While you’re a pioneer, you’re certainly not the first one to tread these waters. Go have a look at lifetime’s of work:

Chromatin

  • Histome: a database dedicated to displaying information about human histone variants, sites of their post-translational modifications and about various histone modifying enzymes.

DNA Methylation

  • MethylomeDB: The Brain Methylome Database! This database includes genome-wide DNA methylation profiles for human and mouse brains.
  • DiseaseMeth: A web based resource focused on the aberrant methylomes of human diseases.
  • NGSmethDB: A dedicated database for the storage, browsing and data mining of whole-genome, single-base-pair resolution methylomes. We collect NGS data from high-throughput sequencing together with bisulfite conversion of DNA from literature and public repositories, then generating high-quality chromosome methylation maps for many different tissues, pathological conditions and species.
  • MethBase: A central reference methylome database created from public BS-seq datasets. It contains hundreds of methylomes from well studied organisms. For each methylome, Methbase provides methylation level at individual sites, hypo- or hyper-methylated regions, partially methylated regions, allele-specifically methylated regions, and detailed meta data and summary statistics.

Noncoding RNA

  • miRBase: A searchable database of published miRNA sequences and annotation.
  • TarBase: The largest manually curated target database, indexing more than 65,000 miRNA-gene interactions. The database includes targets for 21 species.
  • miRNEST: An integrative collection of animal, plant, and virus microRNA data.
  • RNA 22 v2: Get your miRNA targeting on with an unbiased algorithim that not only considers 3′UTR binding but also 5′UTR binding.
  • NonCode:A database of all kinds of noncoding RNAs (except tRNAs and rRNAs).
  • Human lincRNA Catalog:  A unifying catalogue of previously existing annotation sources with transcripts assembled from RNA-Seq data collected from ~4 billion RNA-Seq reads across 24 tissues and cell types. Each lincRNA is characterized by a panorama of more than 30 properties, including sequence, structural, transcriptional, and orthology features.

The Repositories:

Some of the ‘raw’ sources databases go through:

Top R-packages:

Now I know we said no coding, but guess what? Unfortunately, sometimes a little R can useful. Here are a few essentials from the bioconductor database to get you started with data that won’t analyze the easy way:

  • BayMeth: For anyone using MBD-seq, MeDIP-seq or any other capture-then-sequence method for DNA methylation mapping, this is a must-use.
  • MOABS: Bioinformatic method for detecting differential DNA methylation from bisulfite sequencing data.
  • ChAMP: call CNVs from your Infinium 450k methylation datasets and process away.
  • Minifi: Take cellular heterogeneity on your 450k arrays into account, after all variety is the spice to life.
  • DMAP: a (C-based) tool for RRBS and WGBS data, which includes a suite of statistical tools and a different investigating approach for analysing DNA methylation data and it also links any list of regions to the genome and provides gene and CpG features.  It now features a novel fragment based analysis for RRBS, which has not been shown before.

Exploring your Data:

Results are great! Now go do something with them.

  • ZENBU: japanese for all, entire, whole, altogether. This browser lets you integrate and interact with your multiple omic data sets in a nice visual environment.
  • EpiExplorer: Import your very own data and compare it to ENCODE.
  • Podbat: A positioning database and analysis tool that incorporates data from various sources and allows detailed dissection of the entire range of chromatin modifications simultaneously. Podbat can be used to analyze, visualize, store and share epigenomics data. Also be sure to check out our coverage on Podbat.
  • CosBI: The histone code, dare you crack it? Learn more about CosBI from Epigenie.
  • miRSNP: Linking sequence to trait, check out what polymorphisms in your microRNA can do.
  • MethPipe: A computational pipeline for analyzing bisulfite sequencing data (BS-seq, WGBS and RRBS).
  • ALEA: Lets you analyze ChIP-seq or RNA-seq data to correlate allele-specific differences with epigenomic status.

Gene Expression Lists, Networks, and Enrichment Analysis for OMIC technologies:

Yes! You’ve got your brand spanking new sequencing/arrays/omics data. But the real question is what are you going to do with it? How do you turn those piles of digital reads of biology back into biology:

  • GeneMania: Gene Networks in a beautiful minimalist style.
  • Enrichr: Check out the biological impact by looking for enriched functions.
  • DAVID: Pathway analysis and functional annotation clustering. Most people love it!
  • GOrilla: a tool for identifying and visualizing enriched GO terms in ranked lists of genes.
  • Expression Atlas: provides information on gene expression patterns under different biological conditions.
  • WebGestalt: a “WEB-based GEne SeT AnaLysis Toolkit”. It is designed for functional genomic, proteomic and large-scale genetic studies from which large number of gene lists are continuously generated.
  • iHOP: A gene network for navigating literature.
  • ArrayMining: Ensemble and Consensus Analysis Methods for Gene Expression Data.
  • Genevestigator: Globally explore public (and/or proprietary) expression data for research and clinical applications.
  • BioModels: A repository of computational models of biological processes that hosts models described in peer-review.
  • GenomeSpace: A cloud-based framework for integrative genomics analysis through an easy-to-use Web interface.
  • Magia: A web tool for mirna-gene integrated analysis.
  • mirConnX: A web server that analyzes mRNA and microRNA gene regulatory networks. mirConnX combines sequence information with gene expression data analysis to create a disease specific, genome-wide regulatory network.

Promoter and Transcription Factor Sequence and Tools:

Promoter sequence on it’s own isn’t so great, but when you add the magic of transcription factor bindings, suddenly the regulation of gene expression begins to make sense:

  • Homer: A novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. The art is just a bonus.
  • MpromDB: Mammalian Promoter Database.
  • The MEME Suite: Motif Based Sequence Analysis Tools.
  • CTCF Insulator Database: in silico prediction for all your genomic insulation needs!
  • CENTDIST: a novel web-application for identifying co-localized transcription factors around ChIP-seq peaks. Unlike traditional motif scanning program, it does not require any user-specific parameters and the background. It automatically learns the best set of parameters for different motifs and ranks them based on the skewness of their distribution around ChIP-seq peaks.
  • ChIP-Array: A combination of ChIP-seq/chip Transcription Factor Binding Sites and gene expression platform. It takes ChIP-Array or ChIP-seq expression data together to construct a regulatory network around a Transcription Factor of interest in human, mouse, yeast, fly, and arabidopsis.
  • cisRED: A database that holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations.
  • CARRIE: It takes takes two condition microarray data and applies promoter analysis to infer the stimulated/repressed transcriptional regulatory network.
  • DiRE: A web server for predicting distant (outside of proximal promoter regions) regulatory elements (DiRE) in higher eukaryotic genomces using gene co-expression data, comparative genomics as well as transcription factor binding site information. DiRE allows users to start analysis with raw microarray expression data.
  • FatiGO: Gene expression and functional profiling analysis suite.
  • Melina: (Motif Elucidator in Nucleotide Sequence Assembly) can run multiple motif prediction tools simultaneously. Graphical results can be used to compare predictions of potential DNA motifs (such as transcription factor binding sites, TFBS) in promoter regions.
  • oPOSSUM:a web-based system for the detection of over-represented conserved transcription factor binding sites and binding site combinations in sets of genes or sequences.
  • Pscan-ChIP: a web server which scans ChIP seq genomic region data for over-representated transcription factor binding site motifs.
  • Regulatory Sequence Analysis Tools: detects regulatory signals in non-coding sequences.

Genomically Imprinted Genes:

Genomic Imprinting is a nifty process, one of epigenetics breakout moments. Since then a devoted few have shared the knowledge of a big chunk of what genes are imprinted, go check out if your favorite is:

Sodium Bisulfite Primer Design:

When Bisulfite reduces the complexity of a sequence, it increases the complexity of it’s primer design. These programs help take the pain out of bisulfite primer design:

Protein and Gene Information:

The end products of molecular inheritance:

  • GeneCards: Your favorite genes, on cards!
  • Uniprot: A go to database for details on proteins.
  • UGene: Protein Structure (go ahead, run your mouse over that screenshot).
  • BioMart: A federated database system that provides unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework.

Venn Diagram Creators:

A great big picture perspective for comparing multiple experimental conditions in a circular manner:

Protocols:

Basic Bioinformatic Suite

DNA/Protein Sequence Formatting Cleaner:

  • Sequence Cleaner: Sequence Databases are often great, but sometimes going between them is a pain when all your format ain’t FASTA.

Simple Chemoinformatics plus Google?!:

  • Chemicalize: A combined text and chemical structure search engine.

Multi-Step Genomic Analysis Platform:

  • GenePattern: A powerful genomic analysis platform that provides access to hundreds of tools for gene expression analysis, proteomics, SNP analysis, flow cytometry, RNA-seq analysis, and common data processing tasks. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.

Looking to learn how to code? try CodersCrowd for a community perspective.  Taking too long to learn? Look busy coding.

Still can’t find what you want? Dive into this bioinformatic monster.

Got a tool you dig? Let us know about it so we can share it in the spirit of open science!