Tools, where would humankind be without them? The EpiGenie team decided to search out and compile a list of the best tools and databases that epigenetics researchers can’t live without.
Here we go:
Data is great! Now go do something with it. Use these data analysis and visualization tools to help decipher your data. Need to get a little intro into biostatistics? Go learn some R and check out the essentials from the bioconductor database to get you started with data that won’t analyze the easy way. Here are a few of the best data analysis and visualization packages out there:
- Pipelines can be so much work, so if you’ve got WGBS, RRBS, or enzymatic methyl-seq (EM-seq) data, why not check out these two repos by our very own Epigenetics Editor who made them with you mind:
- CpG_Me: A whole genome bisulfite sequencing (WGBS) pipeline for the alignment and QC of DNA methylation that goes from from raw reads (FastQ) to a CpG count matrix (Bismark cytosine reports).
- DMRichR: A R package and executable for the statistical analysis and visualization of differentially methylated regions (DMRs) of CpG count matrices (Bismark cytosine reports). It primarily utilizes the dmrseq and bsseq algorithms and provides upstream pre-processing as well downstream analyses and data visualization.
- methyKit: If you love single CpG statistics, then this is the Bioconductor (R) package for you. It’s focused on high-throughput bisulfite sequencing methods, such high-coverage WGBS, RRBS and its variants, target-capture methods, as well as 5hmC protocols such as oxBS-Seq and TAB-Seq. All it needs are your Bismark aligned BAM files.
- RnBeads: A Bioconductor (R) package for comprehensive analysis of DNA methylation data from Illumina Infinium arrays (450K and EPIC) and BS-seq. MeDIP-seq and MBD-seq are also supported after some external processing.
- MEDIPS: A Bioconductor (R) package for methylated DNA immunoprecipitation (MeDIP) experiments followed by sequencing (MeDIP-seq).
- MethPipe: Analyzes your WGBS, and RRBS data to identify DMRs, allele-specific methylation, and partially methylated domains.
- Minfi: A Bioconductor (R) package for your Illumina Infinium arrays (450K and EPIC) that provides comprehensive analysis and takes cellular heterogeneity into account, after all variety is the spice to life.
- DMRcate: A Bioconductor (R) package for DMR identification from the human genome using WGBS and Illumina Infinium array (450K and EPIC) data.
- ChAMP: A Bioconductor (R) package that offers QC/QA metrics and a number of normalization methods in order to identify DMRs and copy number variations in Illumina Infinium array (450K and EPIC) data.
- FEM: A Bioconductor (R) package that offers integrative analysis of DNA methylation and gene expression data.
- coMET: A Bioconductor (R) package for the visualisation of Epigenome-Wide Association Study (EWAS) from a genomic region perspective.
- Repitools: A Bioconductor (R) package for the analysis of enrichment-based DNA methylation data.
- ELMER: Use DNA methylation array and gene expression data to discover the regulatory element landscape and transcription factor network.
- nfcore/chipseq: A pipeline used for Chromatin ImmunopreciPitation sequencing (ChIP-seq) data built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
- BWA: The ChIP-seq aligner.
- MACS: Model-based Analysis of ChIP-seq (MACS) is a go to peak-finding algorithm.
- deepTools: A suite of python tools that tackle a lot of that complex ChIP-seq pipeline.
- DESeq2: A Bioconductor (R) package for detecting differential peaks in your ChIP-seq data.
- EdgeR: A Bioconductor (R) package for detecting differential peaks in your ChIP-seq data.
- PAVIS: PAVIS (Peak Annotation and Visualization) lets you annotate and visualize your ChIP-seq and BS-seq data.
- EaSeq: Lets you analyze and visualize your ChIP-seq data with graphical user interface that runs on a typical PC.
- ALEA: Lets you analyze ChIP-seq or RNA-seq data to correlate allele-specific differences with epigenomic status.
- CENTDIST: A web-application that identifies transcription factors hanging around your ChIP-seq peaks.
- ChIP-Array v2.0: Integrate your ChIP-seq or ChIP-CHIP data with gene expression to build a regulatory network. Works for human, mouse, yeast, fly, and arabidopsis data.
- CosBI: The histone code, dare you crack it? Learn more about CosBI from Epigenie.
- Epigenomix: A Bioconductor (R) package that lets you integrate your RNA-seq or microarray data with your ChIP-seq data. It lets you preprocess and create differential gene lists for both data sets.
- HMCan: a tool to call peaks in ChIP-seq/ATAC-seq data generated from cancer cells. It corrects for GC-content bias and DNA copy number aberrations.
- HMCan-diff: a tool to detect differential chromatin modifications in cancer ChIP-seq data with a correction for copy number aberrations.
- LILY: A method to call super-enhancers in cancer cells with DNA copy number aberrations.
- nf-core/rnaseq: A bioinformatic analysis pipeline used for RNA sequencing data built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
- STAR: RNA-seq aligner that performs simultaneous read mapping and counting. While you can probably get away without adapter trimming in most cases with a local aligner like STAR, it’s probably still worth the effort to remove the biases. So why not add a Trim Galore run to your pipeline?
- Kallisto: A program for quantifying abundances of transcripts from RNA-seq data. Uses pseudoalignment to skip the alignment step.
- Sleuth: A program for analysis of RNA-Seq experiments for which transcript abundances have been quantified with kallisto.
- Salmon: A tool for quantifying the expression of transcripts using RNA-seq data.
- DESeq2: A Bioconductor (R) package for detecting differential expression of transcripts in your RNA-seq data.
- EdgeR: A Bioconductor (R) package for differential expression analysis of RNA-seq data.
- RNA22v2: Get your miRNA targeting on with an unbiased algorithim that not only considers 3’UTR binding but also 5’UTR binding.
- DIANA Tools: A suite of tools that include target prediction algorithms and experimentally verified miRNA targets.
- TargetScan: Predicts miRNA targets by searching for conserved sites that match the seed of each miRNA. There are different version for humans, mouse, worm, fly, and fish.
- miRWalk: miRNA binding sites within the complete sequence of a gene and a comparison of binding sites from 12 existing miRNA-target prediction programs.
- GCRMA: Pre-processing algorithm for affymetrix arrays.
Downstream Analyses and Visualization:
So, you’ve got that wonderful omic data down to a nice little gene list. Now it’s time to have fun figuring out just what they’re all up to.
Gene Ontology (GO) and Pathway Analyses:
- GOfuncR: The Bioconductor (R) package for GO analyses, which provides an R interface to FUNC. This program has options that allow you to work with pretty much any method, so you can remove all those pesky limitations that come with a basic GO run, like genomic background and gene length.
- REVIGO: Shorten your long list of Gene Ontology terms by removing redundant ones, so you can get a systems level perspective that isn’t driven by the top term.
- Enrichr: Find out about transcriptional regulation, pathways, onotologies, and much more from this neat little web tool.
- ConsensusPathwayDB: A master tool that pulls from a large number of databases to provide ontology and pathway analysis for humans, mice, and yeast.
- GREAT: Genomic Regions Enrichment of Annotations Tool (GREAT) gives biological context to non-coding genomic regions by analyzing the annotations of the nearby genes. It’s great for analyzing genomic coordinates from your ChIP-seq and DNA methylation data.
- WGCNA: an R package for weighted correlation network analysis that can be used to find correlated gene clusters.
- Gene Set Enrichment Analysis (GSEA): The name says it all, this pioneering program lets you compare against the Molecular Signatures Database (MSigDB).
- The Database for Annotation, Visualization and Integrated Discovery (DAVID): The granddaddy of them all.
- STRING: STRING is a database of known and predicted protein interactions that lets you visualzie interacting networks.
- GeneMANIA: GeneMANIA lets you visualize your gene lists and finds other related genes by using a very large set of functional association data.
- g:Profiler: Ontologies, pathways, and more from your gene list. Currently available for 200+ species.
- ToppGene: A suite of tools to see what is enriched for in your gene list.
- Homer: Discover motifs critical to the differences between your sample groups. The art and jokes are just a bonus. Also, don’t forget to check out the MARGE R package.
- Lisa: epigenetic Landscape In Silico deletion Analysis determines the transcription factors and chromatin regulators behind your gene list.
- The MEME Suite: Motif Based Sequence Analysis Tools.
- MethMotif: A cell-type specific database with transcription factor binding site motifs and accompanying DNA Methylation profiles.
- Epigram: An analysis pipeline that predicts histone modification and DNA methylation patterns from DNA motifs. Check out our coverage of Epigram.
- oPOSSUM: Analyze your gene list for transcription factor binding sites from a number of species.
- Classification of Human Transcription Factors: A large database that classifies human transcription factors.
Other Useful Tools:
- GAT: Genomic Association Tester (GAT) lets you compute the significance of the overlap between all your fancy data sets.
- GeneOverlap: A Bioconductor (R) package to statistically test and then visualize gene overlaps between multiple gene lists.
- LOLA: A Bioconductor (R) package that lets you test your genomic coordinates for enrichments in a large variety of internal and external datasets.
- COCOA: A Bioconductor (R) package for understanding epigenetic variation among samples. It works with genomic coordinates, such as those from DNA methylation and chromatin accessibility data.
- ngs.plot: Visualize your results at functional genomic regions.
- ZENBU: Japanese for all, entire, whole, altogether. This browser lets you integrate and interact with your data in a nice visual environment.
- CTCF Insulator Database: In silico prediction for all your genomic insulation needs!
- EpiExplorer: Import your very own data and compare it to ENCODE.
- Podbat: A positioning database and analysis tool that takes data from a number of sources and to allow for the detailed dissection of a range of chromatin modifications. It can be used to analyze, visualize, store and share your data. Check out our coverage on Podbat.
- BioWardrobe: Lets you store, visualize, analyze and integrate epigenomic and transcriptomic data using a web-based graphical user interface that doesn’t require programming expertize.
- Galaxy: Provides an interface to help you with all the fancy code needed for genomic and transcriptomic analyses.
- Babelomics 5: A user-friendly interface for a suite of tools for gene expression and genomic data.
Sodium Bisulfite Primer Design:
When Bisulfite reduces the complexity of a sequence, it increases the complexity of it’s primer design. These programs help take the pain out of bisulfite primer design:
- AmpliconDesign: A a primer design web tool for targeted DNA methylation analysis. It supports EpiTYPER MassARRAY or targeted Amplicon Bisulfite Sequencing.
- BiSearch: A primer-design algorithm that can be with both bisulfite converted and non-converted sequences.
- MethPrimer: A program that lets you design primers for bisulfite PCR that also predicts CpG islands in DNA sequences. It lets you design primers for Methylation-Specific PCR (MSP), Bisulfite-Sequencing PCR (BSP), and Bisulfite-Restriction PCR.
Why not stand on the shoulders of giants?
- IHEC Data Portal: The International Human Epigenome Consortium (IHEC) brings forth reference epigenomes relevant to health and disease. View, search, and download all the data.
- CEEHRC Data Portal: The Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) portion of the IHEC reference epigenome project.
- ENCODE: Encyclopedia of DNA Elements. Also available on ENSEMBL and the UCSC genome browser.
- ROADMAP Epigenomics: The NIH Roadmap Epigenomics Mapping Consortium offers maps of histone modifications, chromatin accessibility, DNA methylation, and mRNA expression across 100s of human cell types and tissues.
- DeepBlue: Store and work with genomic and epigenomic data from a number of international consortiums.
- Epigenome Browser: For the UCSC genome browser fans.
- WashU Epigenome Browser: A web browser that offers tracks from ENCODE and Roadmap Epigenomics projects.
- 4DGenome: A database of chromatin interactions across five species. Includes data from 3C, 4C, 5C, ChIA-PET, Hi-C, Capture-C, and IM-PET.
- EWAS Atlas: A knowledgebase of epigenome-wide association studies.
- NGSmethDB: Whole-genome bisulfite sequencing (WGBS) database for many different tissues, pathological conditions, and species.
- MethBase: Hundreds of methylomes from well studied organisms.
- miRBase: Published miRNA sequences.
- NonCode: A database of all kinds of noncoding RNAs (except tRNAs and rRNAs) for 16 species.
- PolymiRTS: Linking sequence to trait, check out what polymorphisms in your microRNA can do.
- circBase: Public circRNA data and custom python scripts for circRNA discovery in your own (ribominus) RNA-seq data.
Got a tool or database you dig? Let us know about it so we can share it in the spirit of open science!