GWASs (genome-wide association studies) have found a lot of genetic variants associated with various traits and diseases. But GWASs suffer from that old weakness, the mantra of every good scientist: correlation does not imply causation. The problem gets even worse when you find out most GWAS variants are in non-coding regions of DNA, meaning they likely have more subtle regulatory effects than, for example, breaking a protein outright.
But we are now living in the gene editing era, and that means we should be able to test potential regulatory SNPs (single-nucleotide polymorphisms; i.e., gene variants) by using genome and epigenome editing to see which SNPs are actually important.
Now, a new paper from a team in Boston demonstrates a workflow to snip out causal SNPs, called CAUSEL. First authors Sándor Spisák and Kate Lawrenson demonstrated their 5-step program by characterizing a prostate cancer risk locus.
Fine Mapping
First, the team identified all potentially causal gene variants in a known prostate cancer risk locus using sequence data from 35,000 people. This pointed to 27 SNPs that were associated with prostate cancer.
Epigenomic Profiling
To narrow their focus even further, they next compared this SNP map with an epigenetic map of a prostate cancer cell line, finding one SNP in particular that overlapped with several epigenetic features, including methylation and DNase sensitivity.
Epigenome Editing
With one candidate SNP highlighted, the team used a TALE-LSD1* fusion protein to specifically remove H3K4 methylation from the region, which reduced expression of the nearby gene RFX6 by two thirds. Conversely, when the team fused the same locus-targeting TALE domains to a VP64 activation domain, RFX6 expression more than doubled.
Genome Editing
To connect local epigenetic effects to the actual SNP, the team next used TALENs to produce otherwise identical cell lines with all three possible genotypes (CC, CT, and TT) at their SNP. This step actually required some pretty clever high-throughput screening, because homology-directed repair (HDR) was only about 0.3% successful. When the appropriate genotypes were finally in hand, they found the T version of the SNP dramatically increased RFX6 expression.
Phenotyping
Finally, the team circled back to phenotype, finding the TT cell lines were more adherent and took a different shape than CC clones, and transcriptome sequencing showed the SNP clearly changed global gene expression.
GWAS data has been maligned for lack of causal information, but this paper demonstrates that this no longer has to be the case. With new genome and epigenome editing technologies, candidate SNPs identified by GWASs can be systematically tested to see which are really important, what they do, and which SNPs are merely along for the correlational ride.
To get started finding causation in your own high-throughput correlations, just check out the paper in Nature Medicine, 2015.