Genepi Regensburg Software
Easy2
Easy2 is a combination of the previous EasyQC and EasyStrata R-packages and provides the latest advanced functionality.
The latest version and a Wiki on Easy-commands is accessible through github:
https://github.com/winkusch/Easy2 (externer Link, öffnet neues Fenster)
For questions, please contact: thomas.winkler(at)ur.de (öffnet Ihr E-Mail-Programm)
If you use Easy2
- for GWAS quality control please cite: "Winkler et al.: Quality control and conduct of genome-wide association meta-analyses. Nature Protocols 2014"
- for GWAS evaluation: Winkler et al.: EasyStrata: evaluation and visualization of stratified genome-wide association meta-analysis data. Bioinformatics 2014
KidneyGPS
The results of Gene PrioritiSation (GPS) based on GWAS meta-analyses (GWAMA) and post-GWAMA are high-dimensional and requires expertise for interpretation. To provide easier access to the relevant results from GWAMAs and post-GWAMAs for kidney function and kidney function decline to experts from other fields (e.g. medicine, physiology, biology), we have developed KidneyGPS as a user-friendly web-application. KidneyGPS enables easy access by search functions on genes, variants, and regions, to prioritize genes and variants likely relevant for kidney function in humans for functional follow-up. Several options allow for customizing the presented output according to the specific needs of the user.
The current version of the Kidney GPS is accessible here: https://kidneygps.ur.de/gps/ (externer Link, öffnet neues Fenster)
If you use Kidney GPS, please cite: Stanzick KJ et al: KidneyGPS: a user-friendly web application to help prioritize kidney function genes and variants based on evidence from genome-wide association studies. BMC Bioinformatics 2023 (externer Link, öffnet neues Fenster)
EasyQC (Winkler et al. Nat Protoc 2014)
Description
EasyQC is an R-package that provides advanced funcionality
(i) to perform file-level QC of single genome-wide association (GWA) data-sets;
(ii) to conduct quality control across several GWA data-sets (meta-level QC);
(iii) to simplify data-handling of large-scale GWA data-sets
One could also say, it can be used as Nonsense-Detector for study-specific GWA data-sets.
Download
If you want to use EasyQC, please download and use the latest version of the Easy2 R package that is available on github:
https://github.com/winkusch/Easy2 (externer Link, öffnet neues Fenster)
All functionality by EasyQC is available in Easy2. Easy2 is maintained actively.
Download – Early AMD meta-analysis cleaning material
The following EasyQC ecf-script was used for quality control of 11 Early AMD GWAS prior to meta-analysis (Winkler et al. BMC Medical Genomics 2020). The cleaning script was developed for binary outcome GWAS that were conducted with rvtest:
studyqc-earlyamd.ecf (externer Link, öffnet neues Fenster)
The script is capable with rvtest output. Further details and instructions on the individual steps are shown as comments in the script.
Download – 1000 Genomes / HRC cleaning material
The following material can be used for quality control of 1000 Genomes or HRC imputed GWAS result data sets.
Scripts:
fileqc_1000G.ecf (externer Link, öffnet neues Fenster)
This script can be used with the below 1000G or HRC reference files and incorporates different QC steps such as Sanity Checks, Filtering, Allele coding harmonization, Marker harmonization, Allele frequency checks, QQ plots, etc. In particular the allele coding and the marker harmoniization are inevitable steps prior to meta-analysis.
Allele frequency reference data (all based on NCBI build 37):
The provided allele frequency reference files are using the cptid format for marker identifiers. The cptid format is automatically generated by the EasyQC function CREATECPTID. Please see the EasyQC manual for more detailed information on the format.
Allele frequency reference data for 1000G phase1 version3 imputed GWAS (based on allele frequencies given in the "legend" files from the IMPUTE website (externer Link, öffnet neues Fenster)):
Excluding X-Chr variants:
allelefreq.1000G_EUR_p1v3.impute_legends.noDup.noX.gz (externer Link, öffnet neues Fenster)
allelefreq.1000G_AFR_p1v3.impute_legends.noDup.noX.gz (externer Link, öffnet neues Fenster)
allelefreq.1000G_AMR_p1v3.impute_legends.noDup.noX.gz (externer Link, öffnet neues Fenster)
allelefreq.1000G_ASN_p1v3.impute_legends.noDup.noX.gz (externer Link, öffnet neues Fenster)
Excluding X-Chr variants, excluding monomorphic variants:
Allele frequency reference data for 1000G phase3 version5 imputed GWAS (based on allele frequencies given in the "legend" files from the IMPUTE website (externer Link, öffnet neues Fenster)):
Excluding monomorphic variants, excluding CNVs:
Allele freq. reference data for Haplotype Reference Consortium (HRC) imputed GWAS (based on allele frequencies given in the reference file provided by Will Rayner, http://www.well.ox.ac.uk/~wrayner/tools/#Checking (externer Link, öffnet neues Fenster)):
Excluding variants with mac<5 or maf<0.1%
HRC.r1-1.GRCh37.wgs.mac5.sites.tab.cptid.maf001.gz (externer Link, öffnet neues Fenster)
Mapping files (all based on NCBI build 37):
The mapping files contain information about chromosome and position for various contained marker identifiers (e.g., rsIDs) that do not contain the chromosomal and position information within the marker name (e.g., "chr1:123:AT_A"). The files are based on imputation reference files from the MACH and the IMPUTE websites. It can be used with the EasyQC function CREATECPTID that allows for harmonization of marker names across studies by compiling unique cptid's. Please see the EasyQC manual for more detailled information on the cptid format.
Mapping file for 1000G phase1 version3 imputed GWAS:
Mapping file for 1000G phase3 version5 imputed GWAS:
Mapping file for HRC imputed GWAS:
HRC.r1-1.GRCh37.wgs.mac5.sites.tab.rsid_map.gz (externer Link, öffnet neues Fenster)
Change log for mapping files: CHANGE_map.log (externer Link, öffnet neues Fenster)
Download – GIANT QC paper (Winkler et al) material:
The following material has been used for quality control and for several projects of the Genetic Investigation of ANthropometric Traits (GIANT) consortium (externer Link, öffnet neues Fenster).
Scripts:
File-level QC scripts:
1_filelevel_qc.gwa.ecf (externer Link, öffnet neues Fenster) (for HapMap imputed data)
1_filelevel_qc.metabochip.ecf (externer Link, öffnet neues Fenster) (for genotyped Metabochip data)
Meta-level QC script:
2_metalevel_qc.ecf (externer Link, öffnet neues Fenster)
Meta-Analysis script (to be used with metal):
3_metal_metaanalysis.txt (externer Link, öffnet neues Fenster)
Meta-Analysis QC scripts
4_metaanalysis_qc.compare.ecf (externer Link, öffnet neues Fenster)
4_metaanalysis_qc.compare_logfiles.r (externer Link, öffnet neues Fenster) (R-script)
4_metaanalysis_qc.studymeta.ecf (externer Link, öffnet neues Fenster)
Reference data:
Allele frequency reference data (based on NCBI build 36):
AlleleFreq_HapMap_CEU.v2.txt.gz (externer Link, öffnet neues Fenster) (for CEU HapMap imputed data)
AlleleFreq_1000G_EUR_Metabochip.v1.txt.gz (externer Link, öffnet neues Fenster) (for CEU genotyped Metabochip data)
Marker harmonization reference data (based on NCBI build 36):
SNPID_to_ChrPosID.b36_v2.txt.gz (externer Link, öffnet neues Fenster)
QT interval SNPs reference data (based on NCBI build 36):
QTSNPs_AEL_TW.txt (externer Link, öffnet neues Fenster)
Please see our QC paper "Winkler et al.: Quality control and conduct of genome-wide association meta-analyses. Nature Protocols 2014" for further details regarding this scripts and material.
Download – Exomechip cleaning material
Scripts:
Cleaning scripts for Rvtests output:
clean_rvtests.ecf (externer Link, öffnet neues Fenster) (for Rvtets association output)
clean_rvtests_cov.ecf (externer Link, öffnet neues Fenster) (for Rvtets *Cov* output)
Cleaning scripts for Raremetalworker output:
clean_raremetalworker.ecf (externer Link, öffnet neues Fenster) (for Raremetalworker association output)
clean_raremetalworker_cov.ecf (externer Link, öffnet neues Fenster) (for Raremetalworker *cov* output)
Reference data:
Exomechip Allele frequency reference data:
AFR.frequencies (externer Link, öffnet neues Fenster)
AMR.frequencies (externer Link, öffnet neues Fenster)
EasyStrata (Winkler et al. Bioinformatics 2015)
Description
EasyStrata is an R-package that provides advanced funcionality
(i) for the evaluation of stratified GWAS;
(ii) for plotting GWAS results with a specific focus on stratification;
(iii) to simplify data-handling of large-scale GWA data-sets
Download
If you want to use EasyStrata, please download and use the latest version of the Easy2 R package that is available on github:
https://github.com/winkusch/Easy2 (externer Link, öffnet neues Fenster)
All functionality by EasyStrata is available in Easy2. Easy2 is maintained actively.
Download – Example scripts and data
The following scripts have been developed and can be used for the evaluation of stratified GWAMA results from the Genetic Investigation of ANthropometric Traits (GIANT) consortium (externer Link, öffnet neues Fenster).
Scripts:
Plotting scripts:
easystrata_figure1_miami.ecf (externer Link, öffnet neues Fenster) (Miami-Plot for contrasting two strata)
easystrata_supplfigure3_qqplot.ecf (externer Link, öffnet neues Fenster) (QQ-Plot of multiple strata)
easystrata_supplfigure4_scatter.ecf (externer Link, öffnet neues Fenster) (Scatter-Plot of strata-specific effect sizes)
easystrata_supplfigure5_qq_omitreported.ecf (externer Link, öffnet neues Fenster) (QQ-Plot excluding known loci)
easystrata_supplfigure6_plotspeed.ecf (externer Link, öffnet neues Fenster) (Increasing plot speed)
easystrata_supplfigure7_break_yaxis.ecf (externer Link, öffnet neues Fenster) (Breaking up y-axis of Manhattan-plot)
easystrata_supplfigure8_panel.ecf (externer Link, öffnet neues Fenster) (Panel of QQ and scatter plots)
Evaluation scripts:
easystrata_supplpipe2A_sexdiff.ecf (externer Link, öffnet neues Fenster) (Difference btw. 2 strata)
easystrata_supplpipe2B_sexdiff_filt.ecf (externer Link, öffnet neues Fenster) (Difference btw. 2 strata + overall filter)
easystrata_supplpipe2C_joint.ecf (externer Link, öffnet neues Fenster) (Joint main+interaction effect)
Integrative genome screen script (Winkler et al. NatComm 2018):
integrative_screen.ecf (externer Link, öffnet neues Fenster) (The integrative screen script requires EasyStrata v18.1 or greater that can be downloaded here: EasyStrata_18.1.tar.gz (externer Link, öffnet neues Fenster))
Data:
Example mapping file:
hapmap36.map (externer Link, öffnet neues Fenster) (Hapmap b36 mapping file: SNPID, Chromosome, Position)
Example locus annotation file:
WAIST_2009_2010_14_reported.txt (externer Link, öffnet neues Fenster) (Known waist-hip ratio loci, published by Lindgren et al 2009, Heid et al 2010)
Citation
If you use EasyStrata please cite "Winkler et al.: EasyStrata: evaluation and visualization of stratified genome-wide association meta-analysis data. Bioinformatics 2015 (externer Link, öffnet neues Fenster)"
MLA-bilateral (Günther et al. 2020)
MLA-bilateral
mcblog
The mcblog R-package provides an implementation of the maximum likelihood approach to adjust worse-entity logistic regression for bilateral disease for entity-specific misclassification using validation data as introduced in Guenther et al. (2020). This approach can e.g., be used to adjust genetic association estimates for bilateral disease phenotypes (e.g., age-related macular degeneration) for misclassification in the disease status due to error-prone or suboptimal entity specific disease classifications when gold-standard classifications are available for a subset ofentities.
Download
mcblog_0.0.0.9000.tar.gz (externer Link, öffnet neues Fenster)
Example code
Please see the following vignette for an introduction into the usage of the R-package and illustrative examples:
introduction_mcblog.html (externer Link, öffnet neues Fenster)
Reference:
Guenther, F., Brandl, C., Winkler, T. W., Wanner, V., Stark, K., Küchenhoff, H., & Heid, I. M. (2020). Chances and challenges of machine learning based disease classification in genetic association studies illustrated on age-related macular degeneration. Genetic Epidemiology.
Contact
Felix.Guenther(at)stat.uni-muenchen.de , thomas.winkler(at)ur.de