Here we go:
Epigenetic Data Browsers and Repositories:
Go ahead, get your feet wet:
- IHEC Data Portal: The International Human Epigenome Consortium (IHEC) makes available comprehensive sets of reference epigenomes relevant to health and disease. The IHEC Data Portal can be used to view, search and download the data already released by the different IHEC-associated projects.
- ROADMAP Epigenomics: The NIH Roadmap Epigenomics Mapping Consortium provides high-quality, genome-wide maps of several key histone modifications, chromatin accessibility, DNA methylation and mRNA expression across 100s of human cell types and tissues.
- CEEHRC Platform: A reference epigenome project for human cells, and not the typical stem cell lines.
- DeepBlue: Store and work with genomic and epigenomic data from a number of international consortiums.
- Epigenome Browser: For the UCSC genome browser fans.
- VizHub: Displaying sequencing data from Roadmap Epigenomics project, powered by a local mirror of UCSC Genome Browser.
- WashU Epigenome Browser: A new-generation genome browser for integrative visualization of genomic information. Hosts high volume of tracks from ENCODE and Roadmap Epigenomics projects, supports multiple organisms, visualizes chromatin-interaction data (e.g. Hi-C), performs gene set view, gene plot, and many others. All delivered on the web at high performance.
- Ensembl: Featuring ENCODE.
- GenExp: A web-based visualization tool to interactively explore a genomic database.
- GEO: The granddaddy of epigenomics data repositories.
- The Epigenome Atlas: It includes human reference epigenomes and the results of their integrative and comparative analyses.
- SampleBrowser: Part of NCBI’s epigenomics.
- Classification of Human Transcription Factors: The mother list of transcription factors and their binding sites.
While you’re a pioneer, you’re certainly not the first one to tread these waters. Go have a look at how lifetime’s of work:
- 4DGenome: A database that contains records on millions of chromatin interactions in five species. It covers all major experimental and computational technologies for detecting chromatin interactions including 3C, 4C, 5C, ChIA-PET, Hi-C, Capture-C, and IM-PET.
- 3CDB: A manually curated chromosome conformation capture (3C) database.
- Histome: a database dedicated to displaying information about human histone variants, sites of their post-translational modifications and about various histone modifying enzymes.
- MethylomeDB: The Brain Methylome Database! This database includes genome-wide DNA methylation profiles for human and mouse brains.
- DiseaseMeth: A web based resource focused on the aberrant methylomes of human diseases.
- NGSmethDB: A dedicated database for the storage, browsing and data mining of whole-genome, single-base-pair resolution methylomes. We collect NGS data from high-throughput sequencing together with bisulfite conversion of DNA from literature and public repositories, then generating high-quality chromosome methylation maps for many different tissues, pathological conditions and species.
- MethBase: A central reference methylome database created from public BS-seq datasets. It contains hundreds of methylomes from well studied organisms. For each methylome, Methbase provides methylation level at individual sites, hypo- or hyper-methylated regions, partially methylated regions, allele-specifically methylated regions, and detailed meta data and summary statistics.
- miRWalk 2.0: It not only provides miRNA binding sites within the complete sequence of a gene, but also combines this information with a comparison of binding sites resulting from 12 existing miRNA-target prediction programs.
- miRBase: A searchable database of published miRNA sequences and annotation.
- TarBase: The largest manually curated target database, indexing more than 65,000 miRNA-gene interactions. The database includes targets for 21 species.
- miRNEST: An integrative collection of animal, plant, and virus microRNA data.
- NonCode: A database of all kinds of noncoding RNAs (except tRNAs and rRNAs).
- Human lincRNA Catalog: A unifying catalogue of previously existing annotation sources with transcripts assembled from RNA-Seq data collected from ~4 billion RNA-Seq reads across 24 tissues and cell types. Each lincRNA is characterized by a panorama of more than 30 properties, including sequence, structural, transcriptional, and orthology features.
- miRSNP: Linking sequence to trait, check out what polymorphisms in your microRNA can do.
- CircNET: Tissue-specific circular RNA (circRNA) expression profiles and circRNA–miRNA-gene regulatory networks.
- circBase: Explore public circRNA datasets and download the custom python scripts needed to discover circRNAs in your own (ribominus) RNA-seq data.
Other Useful Databases:
- MpromDB: Mammalian Promoter Database.
- cisRED: A database that holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations.
Epigenetic Tools for Statistical Data Analysis and Visualization:
Results are great! Now go do something with them. Use these data analysis and visualization tools to help decipher your data. Need to get a little intro into biostatistics? Go learn some R and check out these essentials from the bioconductor database to get you started with data that won’t analyze the easy way. Here are a few of the best data analysis and visualization packages out there:
- epiGbs: a reference genome free RRBS method that enables cost-effective analysis of DNA methylation and genetic variation in hundreds of samples.
- M3D: A kernel-based test for spatially correlated changes in methylation profiles.
- DMRcate: A software package to identify differentially methylated regions from 450k array data.
- DaVIE: An intuitive user interface to perform visual comparisons across all your large DNA methylation data sets.
- MOABS: Bioinformatic method for detecting differential DNA methylation from bisulfite sequencing data.
- DMAP: A (C-based) tool for RRBS and WGBS data, which includes a suite of statistical tools and a different investigating approach for analysing DNA methylation data and it also links any list of regions to the genome and provides gene and CpG features. It now features a novel fragment based analysis for RRBS, which has not been shown before.
- MethPipe: A computational pipeline for analyzing bisulfite sequencing data (BS-seq, WGBS and RRBS).
- ChAMP: A Bioconductor (R) package that lets you call CNVs from your Infinium 450k methylation datasets and process away in general.
- Minifi: A Bioconductor (R) package that takes cellular heterogeneity on your 450k arrays into account, after all variety is the spice to life.
- FEM: A Bioconductor (R) package that performs a systems-level integrative analysis of DNA methylation and gene expression data. It seeks modules of functionally related genes which exhibit differential promoter DNA methylation and differential expression, where an inverse association between promoter DNA methylation and gene expression is assumed.
- BEAT: A Bioconductor (R) package that serves as a BS-Seq Epimutation Analysis Toolkit. It allows for model-based analysis of single-cell methylation data.
- coMET: A Bioconductor (R) package that enables visualisation of EWAS results in a genomic region. In addition to phenotype-association P-values, coMET also generates plots of co-methylation patterns and provides a series of annotation tracks. It can be used to other omic-wide association scans as long as the data can be translated to genomic level and for any species.
- Repitools: A Bioconductor (R) package that gives you the tools for the analysis of enrichment-based epigenomic data. Features include summarization and visualization of epigenomic data across promoters according to gene expression context, finding regions of differential methylation/binding, BayMeth for quantifying methylation etc.
- methylPipe: A Bioconductor (R) package that enables memory efficient analysis of base resolution DNA methylation data in both the CpG and non-CpG sequence context and also the integration of DNA methylation data derived from any methodology providing base- or low-resolution data.
- RnBeads: an R package for comprehensive analysis of DNA methylation data obtained with any experimental protocol that provides single-CpG resolution. Supported assays include Infinium 450K microarray and bisulfite sequencing protocols, and also MeDIP-seq and MBD-seq once the data have been preprocessed with DNA methylation level inference software.
- SMITE: A Bioconductor (R) package for Significance-based Modules Integrating the Transcriptome and Epigenome.
- Bsseq: A Bioconductor (R) package that provides a collection of tools for analyzing and visualizing bisulfite sequencing data.
- MACS: Model-based Analysis of ChIP-seq (MACS) is a go to peak-finding algorithm.
- PAVIS: PAVIS (Peak Annotation and Visualization) is a tool for facilitating ChIP-seq data analysis and hypotheses generation. It offers two main functions: annotation and visualization.
- EaSeq: EaSeq is a software environment developed for interactive exploration, visualization and analysis of ChIP-seq data combined with a comprehensive toolset. EaSeq is controlled by a graphical user interface and runs on a typical PC.
- ODIN: A ChIP-seq tool that not only detects peaks but can also call and provide statistics on differential peaks between two conditions.
- MMDiff: A package using peak shape to detect statistically significant differences in read enrichment profiles from ChIP-seq data.
- ALEA: Lets you analyze ChIP-seq or RNA-seq data to correlate allele-specific differences with epigenomic status.
- CENTDIST: A novel web-application for identifying co-localized transcription factors around ChIP-seq peaks. Unlike traditional motif scanning program, it does not require any user-specific parameters and the background. It automatically learns the best set of parameters for different motifs and ranks them based on the skewness of their distribution around ChIP-seq peaks.
- ChIP-Array: A combination of ChIP-seq/chip Transcription Factor Binding Sites and gene expression platform. It takes ChIP-Array or ChIP-seq expression data together to construct a regulatory network around a Transcription Factor of interest in human, mouse, yeast, fly, and arabidopsis.
- CosBI: The histone code, dare you crack it? Learn more about CosBI from Epigenie.
- Pscan-ChIP: a web server which scans ChIP seq genomic region data for over-representated transcription factor binding site motifs.
- chroGPS: A Bioconductor (R) package aimed at integration, visualization, and functional analysis of epigenomics data, based in Multidimensional scaling techniques.
- Epigenomix: A Bioconductor (R) package for the integrative analysis of RNA-seq or microarray based gene transcription and histone modification data obtained by ChIP-seq. The package provides methods for data preprocessing and matching as well as methods for fitting bayesian mixture models in order to detect genes with differences in both data types.
- Epigram: An analysis pipeline that predicts histone modification and DNA methylation patterns from DNA motifs. Check out our coverage.
- Homer: A novel motif discovery algorithm that was designed for regulatory element analysis in genomics applications (DNA only, no protein). It is a differential motif discovery algorithm, which means that it takes two sets of sequences and tries to identify the regulatory elements that are specifically enriched in on set relative to the other. The art is just a bonus.
- The MEME Suite: Motif Based Sequence Analysis Tools.
- DiRE: A web server for predicting distant (outside of proximal promoter regions) regulatory elements (DiRE) in higher eukaryotic genomces using gene co-expression data, comparative genomics as well as transcription factor binding site information. DiRE allows users to start analysis with raw microarray expression data.
- Melina: (Motif Elucidator in Nucleotide Sequence Assembly) can run multiple motif prediction tools simultaneously. Graphical results can be used to compare predictions of potential DNA motifs (such as transcription factor binding sites, TFBS) in promoter regions.
- RNA22v2: Get your miRNA targeting on with an unbiased algorithim that not only considers 3’UTR binding but also 5’UTR binding.
- BioWardrobe: BioWardrobe Experiment Management System allows you to to store, visualize and analyze epigenomic and transcriptomic next-generation sequencing data using a biologist-friendly, web-based graphical user interface without the need for programming expertize.
- Galaxy: An open, web-based platform for data intensive biomedical research.
- GCRMA: Pre-processing algorithm for affymetrix arrays.
Other Useful Tools:
- SeqPlots: Interactive software for exploratory data analyses, pattern discovery, and visualization in genomics.
- GAT: Genomic Association Tester (GAT) lets you compute the significance of the overlap between all your fancy data sets.
- ZENBU: Japanese for all, entire, whole, altogether. This browser lets you integrate and interact with your multiple omic data sets in a nice visual environment.
- compEpiTools: A Bioconductor (R) package that provides tools for the analysis, integration, and simultaneous visualization of various (epi)genomics data types across multiple genomic regions in multiple samples.
- EpiExplorer: Import your very own data and compare it to ENCODE.
- Podbat: A positioning database and analysis tool that incorporates data from various sources and allows detailed dissection of the entire range of chromatin modifications simultaneously. Podbat can be used to analyze, visualize, store and share epigenomics data. Also be sure to check out our coverage on Podbat.
- CTCF Insulator Database: in silico prediction for all your genomic insulation needs!
- Regulatory Sequence Analysis Tools: Detects regulatory signals in non-coding sequences.
- CARRIE: It takes takes two-condition microarray data and applies promoter analysis to infer the stimulated/repressed transcriptional regulatory network.
Gene Ontologies and Pathways
So, you’ve got that wonderful omic data down to a nice little list of genes with alterations. Now it’s time to have fun figuring out just what functions they’re all up to.
- Enrichr: Find out about transcriptional regulation, pathways, onotologies, and much more from this neat little tool.
- ConsensusPathwayDB: A molecular functional interaction database, integrating information on protein interactions, genetic interactions signaling, metabolism, gene regulation, and drug-target interactions in humans, mice, and yeast across a number of databases.
- GREAT: Genomic Regions Enrichment of Annotations Tool (GREAT) assigns biological meaning to a set of non-coding genomic regions by analyzing the annotations of the nearby genes. Thus, it is particularly useful in studying cis functions of sets of non-coding genomic regions. It’s great for analyzing genomic coordinates from your ChIP-seq and DNA methylation data.
- WGCNA: an R package for weighted correlation network analysis that can be used for finding clusters (modules) of highly correlated genes.
- Gene Set Enrichment Analysis (GSEA): The name says it all, this pioneering program lets you compare against the Molecular Signatures Database (MSigDB).
- The Database for Annotation, Visualization and Integrated Discovery (DAVID): The granddaddy of them all.
- STRING: STRING is a database of known and predicted protein interactions that allows for visualization into interacting networks.
- GeneMANIA: GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use GeneMANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input.
- GO Elite: GO-Elite is designed to identify a minimal non-redundant set of biological Ontology terms or pathways to describe a particular set of genes or metabolites.
- FatiGO: Gene expression and functional profiling analysis suite.
- oPOSSUM: Web-based system for the detection of over-represented conserved transcription factor binding sites and binding site combinations in sets of genes or sequences.
- REVIGO: Summarize long lists of Gene Ontology terms by removing redundant GO terms. The remaining terms can be visualized in semantic similarity-based scatterplots, interactive graphs, or tag clouds.
- g:Profiler: a public web server for characterising and manipulating gene lists of high-throughput genomics. Currently available for 80+ species, including mammals, plants, fungi, insects, etc from Ensembl and Ensembl Genomes. g:Profiler is normally updated every two months in sync with Ensembl.
- ToppGene: A one-stop portal for gene list enrichment analysis and candidate gene prioritization
based on functional annotations and protein interactions network.
Sodium Bisulfite Primer Design:
When Bisulfite reduces the complexity of a sequence, it increases the complexity of it’s primer design. These programs help take the pain out of bisulfite primer design:
- BiSearch: BiSearch is a primer-design algorithm for DNA sequences. It may be used for both bisulfite converted as well as for original not modified sequences. You can search various genomes with the designed primers to avoid non-specific PCR products by our fast ePCR method.
- MethPrimer: a program for designing bisulfite-conversion-based methylation PCR primers. Currently, it can design primers for two types of bisulfite PCR: 1) Methylation-Specific PCR (MSP) and 2) Bisulfite-Sequencing PCR (BSP) or Bisulfite-Restriction PCR. MethPrimer can also predict CpG islands in DNA sequences
- Bisulfite Primer Seeker: Zymo Research’s handy online bisulfite primer design tool, designed by experts.
Got a tool you dig? Let us know about it so we can share it in the spirit of open science!