Here we go:
Epigenetic Data Browsers and Repositories:
Go ahead, get your feet wet:
- IHEC Data Portal: The International Human Epigenome Consortium (IHEC) brings forth reference epigenomes relevant to health and disease. View, search, and download all the data.
- ROADMAP Epigenomics: The NIH Roadmap Epigenomics Mapping Consortium offers maps of histone modifications, chromatin accessibility, DNA methylation, and mRNA expression across 100s of human cell types and tissues.
- CEEHRC Platform: A reference epigenome project for human cells and not the typical stem cell lines.
- DeepBlue: Store and work with genomic and epigenomic data from a number of international consortiums.
- Epigenome Browser: For the UCSC genome browser fans.
- WashU Epigenome Browser: A web browser that fffers tracks from ENCODE and Roadmap Epigenomics projects.
- Ensembl: Featuring ENCODE.
- GenExp: A web-based visualization tool to interactively explore a genomic database.
- The Epigenome Atlas: Human reference epigenomes.
- Classification of Human Transcription Factors: The mother list of transcription factors and their binding sites.
While you’re a pioneer, you’re certainly not the first one to tread these waters. Go have a look at how lifetime’s of work:
- 4DGenome: A database of chromatin interactions across five species. Includes data from 3C, 4C, 5C, ChIA-PET, Hi-C, Capture-C, and IM-PET.
- 3CDB: A manually curated chromosome conformation capture (3C) database.
- Histome: A database of human histone variants, sites of post-translational modifications, and histone modifying enzymes.
- cisRED: A motif database.
- MethylomeDB: The Brain Methylome Database provides DNA methylation profiles from humans and mice.
- DiseaseMeth: Methylomes of human disease.
- NGSmethDB: Whole-genome bisulfite sequencing (WGBS) database for many different tissues, pathological conditions, and species.
- MethBase: Hundreds of methylomes from well studied organisms.
- miRWalk 2.0: miRNA binding sites within the complete sequence of a gene and a comparison of binding sites from 12 existing miRNA-target prediction programs.
- RNA22v2: Get your miRNA targeting on with an unbiased algorithim that not only considers 3’UTR binding but also 5’UTR binding.
- miRBase: Published miRNA sequences.
- DIANA Tools: A suite of tools that include target prediction algorithms and experimentally verified miRNA targets.
- TargetScan: Predicts miRNA targets by searching for conserved sites that match the seed of each miRNA. There are different version for humans, mouse, worm, fly, and fish.
- MicroCosm Targets: Predicted targets for microRNAs across many species.
- miRNEST 2.0: A database of animal, plant, and virus microRNAs.
- NonCode: A database of all kinds of noncoding RNAs (except tRNAs and rRNAs) for 16 species.
- Human lincRNA Catalog: A detailed human lincRNA catalogue based on ~4 billion RNA-Seq reads across 24 tissues and cell types.
- PolymiRTS: Linking sequence to trait, check out what polymorphisms in your microRNA can do.
- CircNET: Offers profiles of tissue-specific circular RNA (circRNA) expression as well as circRNA–miRNA-gene regulatory networks.
- circBase: Public circRNA data and custom python scripts for circRNA discovery in your own (ribominus) RNA-seq data.
- starBase v2.0: Decode interaction networks of lncRNAs, miRNAs, circRNAs, RNA-binding proteins (RBPs), and mRNAs from large-scale CLIP-Seq data.
- Circ2Traits: Associate your circRNAs with diseases or traits.
Epigenetic Tools for Statistical Data Analysis and Visualization:
Results are great! Now go do something with them. Use these data analysis and visualization tools to help decipher your data. Need to get a little intro into biostatistics? Go learn some R and check out the essentials from the bioconductor database to get you started with data that won’t analyze the easy way. Here are a few of the best data analysis and visualization packages out there:
- Bsseq: A Bioconductor (R) package that offers a suite of tools for analyzing and visualizing your very own WGBS data towards your goal of identifying differentially methylated regions (DMRs).
- epiGbs: A reference genome free reduced representation bisulfite sequencing (RRBS) method that enables cost-effective analysis of DNA methylation and genetic variation in hundreds of samples.
- MethPipe: Analyzes your WGBS, and RRBS data to identify DMRs, allele-specific methylation, and partially methylated domains.
- M3D: A Bioconductor (R) package that uses a kernel methods to identify DMRs.
- MOABS: Bioinformatic method for aligning your WGBS data and detecting DMRs.
- DMAP: A (C-based) tool for RRBS and WGBS data, which includes a suite of statistical tools and a different investigating approach for analysing DNA methylation data and it also links any list of regions to the genome and provides gene and CpG features. It now features a novel fragment based analysis for RRBS, which has not been shown before.
- BEAT: A Bioconductor (R) package that lets you analyze single-cell BS-seq data.
- methylPipe: A Bioconductor (R) package for the analysis of CpG and non-CpG methylation from WGBS data that also enables integration with other epigenomic data sets.
- compEpiTools: A Bioconductor (R) package that helps you analyze, integrate, and visualize multiple epigenomic data sets.
- DMRcate: A Bioconductor (R) package for DMR identification from the human genome using WGBS and Illumina Infinium array (450K and EPIC) data.
- Minifi: A Bioconductor (R) package for your Illumina Infinium arrays (450K and EPIC) that provides comprehensive analysis and takes cellular heterogeneity into account, after all variety is the spice to life.
- ChAMP: A Bioconductor (R) package that offers QC/QA metrics and a number of normalization methods in order to identify DMRs and copy number variations in Illumina Infinium array (450K and EPIC) data.
- FEM: A Bioconductor (R) package that offers integrative analysis of DNA methylation and gene expression data.
- coMET: A Bioconductor (R) package for the visualisation of Epigenome-Wide Association Study (EAWS) from a genomic region perspective.
- Repitools: A Bioconductor (R) package for the analysis of enrichment-based DNA methylation data.
- RnBeads: an R package for comprehensive analysis of DNA methylation data from Illumina Infinium arrays (450K and EPIC) and BS-seq. MeDIP-seq and MBD-seq are also supported after some external processing.
- SMITE: A Bioconductor (R) package for Significance-based Modules Integrating the Transcriptome and Epigenome.
- DaVIE: An intuitive user interface to perform visual comparisons across all your large DNA methylation data sets.
- MACS: Model-based Analysis of ChIP-seq (MACS) is a go to peak-finding algorithm.
- PAVIS: PAVIS (Peak Annotation and Visualization) lets you annotate and visualize your ChIP-seq and BS-seq data.
- EaSeq: Lets you analyze and visualize your ChIP-seq data with graphical user interface that runs on a typical PC.
- ODIN: A ChIP-seq tool that not only detects peaks but can also call and provide statistics on differential peaks between two conditions.
- MMDiff: A Bioconductor (R) package that detect differential peaks in your ChIP-seq data.
- ALEA: Lets you analyze ChIP-seq or RNA-seq data to correlate allele-specific differences with epigenomic status.
- CENTDIST: A web-application that identifies transcription factors hanging around your ChIP-seq peaks.
- ChIP-Array v2.0: Integrate your ChIP-seq or ChIP-CHIP data with gene expression to build a regulatory network. Works for human, mouse, yeast, fly, and arabidopsis data.
- CosBI: The histone code, dare you crack it? Learn more about CosBI from Epigenie.
- Pscan-ChIP: A web server that scans ChIP-seq peak coordinates for over-representated transcription factor binding site motifs.
- chroGPS: A Bioconductor (R) package aimed at integration, visualization, and functional analysis of epigenomics data.
- Epigenomix: A Bioconductor (R) package that lets you integrate your RNA-seq or microarray data with your ChIP-seq data. It lets you preprocess and create differential gene lists for both data sets.
- Epigram: An analysis pipeline that predicts histone modification and DNA methylation patterns from DNA motifs. Check out our coverage.
- Homer: Discover motifs critical to the differences between your sample groups. The art is just a bonus.
- The MEME Suite: Motif Based Sequence Analysis Tools.
- DiRE: A web server for predicting distant (outside of proximal promoter regions) regulatory elements (DiRE) of co-regulated genes.
- Melina: (Motif Elucidator in Nucleotide Sequence Assembly) can run multiple motif prediction tools simultaneously.
- BioWardrobe: Lets you store, visualize, analyze and integrate epigenomic and transcriptomic data using a web-based graphical user interface that doesn’t require programming expertize.
- Galaxy: Provides an interface to help you with all the fancy code needed for RNA-seq.
- Babelomics 5: A user-friendly interface for a suite of tools for gene expression and genomic data.
- Samtools: A suite of programs for interacting with high-throughput sequencing data and formats that include SAM/BAM/CRAM as well as BCF2/VCF/gVCF.
- Picard: A set of command line tools for file types such as SAM/BAM/CRAM and VCF.
- Super-Deduper: Remove PCR duplicates from your paired-end reads.
- FLASH 2: Merge paired-end reads.
- Scythe: Remove sequencing adapters from your single-end reads.
- Sickle: Trim low quality regions from your RNA-seq reads.
- Bowtie 2: Align your sequencing reads to long reference sequences.
- Tophat: A splice junction mapper for RNA-seq reads that uses Bowtie.
- STAR: RNA-seq aligner that peforms simultaneous read mapping and counting.
- Cufflinks: Assembles transcripts, estimates their abundances, and tests for differential expression in RNA-Seq dats. It accepts aligned RNA-seq reads.
- CummeRbund: A R package that helps with analyzing Cufflinks RNA-Seq output.
- DESeq2: Detect differential expression of transcripts in your RNA-seq data.
- EdgeR: A Bioconductor (R) package for differential expression analysis of RNA-seq data.
- Kallisto: A program for quantifying abundances of transcripts from RNA-seq data. Uses pseudoalignment to skip the alignment step.
- Salmon: A tool for quantifying the expression of transcripts using RNA-seq data.
- GCRMA: Pre-processing algorithm for affymetrix arrays.
Other Useful Tools:
- SeqPlots: Exploratory data analysis and visualization tool bundled in a nice app that lets you generate some pretty pictures of your functional genomic features.
- ngs.plot: Visualize your results at functional genomic regions.
- GAT: Genomic Association Tester (GAT) lets you compute the significance of the overlap between all your fancy data sets.
- GeneOverlap: A Bioconductor (R) package to statistically test and then visualize gene overlaps between multiple gene lists.
- ZENBU: Japanese for all, entire, whole, altogether. This browser lets you integrate and interact with your data in a nice visual environment.
- EpiExplorer: Import your very own data and compare it to ENCODE.
- Podbat: A positioning database and analysis tool that takes data from a number of sources and to allow for the detailed dissection of a range of chromatin modifications. It can be used to analyze, visualize, store and share your data. Check out our coverage on Podbat.
- CTCF Insulator Database: In silico prediction for all your genomic insulation needs!
- Regulatory Sequence Analysis Tools: Detects regulatory signals in non-coding sequences.
- CARRIE: It takes takes your two-condition microarray data and applies a promoter analysis to infer a regulatory network.
Gene Ontology and Pathway Analysis
So, you’ve got that wonderful omic data down to a nice little gene list. Now it’s time to have fun figuring out just what they’re all up to.
- Enrichr: Find out about transcriptional regulation, pathways, onotologies, and much more from this neat little web tool.
- ConsensusPathwayDB: A master tool that pulls from a large number of databases to provide ontology and pathway analysis for humans, mice, and yeast.
- GREAT: Genomic Regions Enrichment of Annotations Tool (GREAT) gives biological context to non-coding genomic regions by analyzing the annotations of the nearby genes. It’s great for analyzing genomic coordinates from your ChIP-seq and DNA methylation data.
- WGCNA: an R package for weighted correlation network analysis that can be used to find correlated gene clusters.
- Gene Set Enrichment Analysis (GSEA): The name says it all, this pioneering program lets you compare against the Molecular Signatures Database (MSigDB).
- The Database for Annotation, Visualization and Integrated Discovery (DAVID): The granddaddy of them all.
- STRING: STRING is a database of known and predicted protein interactions that lets you visualzie interacting networks.
- GeneMANIA: GeneMANIA lets you visualize your gene lists and finds other related genes by using a very large set of functional association data.
- GO Elite: Ontology and pathway analysis from multiple databases.
- oPOSSUM: Analyze your gene list for transcription factor binding sites.
- REVIGO: Shorten your long list of Gene Ontology terms by removing redundant ones.
- g:Profiler: Ontologies, pathways, and more from your gene list. Currently available for 200+ species.
- ToppGene: A suite of tools to see what is enriched for in your gene list.
Sodium Bisulfite Primer Design:
When Bisulfite reduces the complexity of a sequence, it increases the complexity of it’s primer design. These programs help take the pain out of bisulfite primer design:
- BiSearch: A primer-design algorithm that can be with both bisulfite converted and non-converted sequences.
- MethPrimer: A program that lets you design primers for bisulfite PCR that also predicts CpG islands in DNA sequences. It lets you design primers for Methylation-Specific PCR (MSP), Bisulfite-Sequencing PCR (BSP), and Bisulfite-Restriction PCR.
- Bisulfite Primer Seeker: Zymo Research’s handy online bisulfite primer design tool, designed by experts.
Got a tool you dig? Let us know about it so we can share it in the spirit of open science!