Consensus Genotyper for Exome Sequencing (CGES, pronounced “sea-guess”) is a tool designed to increase the fidelity of genotypes identified by exome sequencing. CGES uses a machine learning ensemble approach that combines the output of multiple models to dramatically improve classifier performance. It uses a two-stage voting scheme among four algorithm implementations. While our ensemble method can accept variants generated by any variant-calling algorithm, we used GATK2.8, SAMtools, FreeBayes and Atlas-SNP2 in building CGES because of their performance, widespread adoption and diverse but complementary algorithms.You can read more about CGES and its uses in this article from Science Life.
CGES tool repository: Public Github repository containing source code for CGES and its Galaxy wrapper
CGES-QC tool repository: Public Github repository containing source code for CGES-QC and its Galaxy wrapper
Trubetskoy V, Rodriguez A, Dave U, Campbell N, Crawford EL, Cook EH, Sutcliffe JS, Foster I, Madduri R, Cox NJ, Davis LK. Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes. Bioinformatics. 2014;btu591.
Copy number integrated GWAS (cni-GWAS): is a method developed by Dr. Davis and Eric Gamazon to effectively integrate both SNP allelic content and copy number dosage in a single model and estimate their joint effects on phenotype. In contrast to the traditional eQTL mapping approach that assumes diploidy at each candidate eQTL SNP or assumes no SNPs at a CNV locus, we assume that CNVs and SNPs may co-localize (genome-wide). We thus fit the following regression model:
where Y is a gene expression trait, C is the CNV genotype, S is the SNP genotype, b1 is the CNV genotype effect, b2 is the SNP genotype effect, Xb is the effect of nongenotype covariates (e.g., age, sex, or principal components), and e is the residual. The residuals e are assumed to be independently and identically (normally) distributed. Note that, in the absence of a CNV, the model reduces to the simple model that tests only for the presence of a SNP effect. Furthermore, in the absence of a SNP at a CNV locus, the model reduces to a regression that tests for the presence of a CNV eQTL effect. Thus, this approach contains the traditional single variant approaches as special cases.
For more information about the method and paper, check out this write-up in the AJHG Editors’ Corner!