Regensburg GEM Platform - Development of genetic-epidemiologic methods (GEM) und their realization in software (GWAS data quality control, interaction analyses, stratified approaches, Imputation)
Prof. Dr. Iris Heid, Dr. Thomas Winkler, Dr. Mathias Gorski, Dr. Felix Günther, Kira Stanzick M.Sc.
Here you can download software that was developed for various aspects of genome-wide association study (GWAS) analyses.
The results of Gene PrioritiSation (GPS) based on GWAS meta-analyses (GWAMA) and post-GWAMA are high-dimensional and requires expertise for interpretation. To provide easier access to the relevant results from GWAMAs and post-GWAMAs for kidney function and kidney function decline to experts from other fields (e.g. medicine, physiology, biology), we have developed KidneyGPS as a user-friendly web-application. KidneyGPS enables easy access by search functions on genes, variants, and regions, to prioritize genes and variants likely relevant for kidney function in humans for functional follow-up. Several options allow for customizing the presented output according to the specific needs of the user.
Click here to access our Gene PrioritiSation tool
The mcblog R-package provides an implementation of the maximum likelihood approach to adjust worse-entity logistic regression for bilateral disease for entity-specific misclassification using validation data as introduced in Guenther et al. (2020). This approach can e.g., be used to adjust genetic association estimates for bilateral disease phenotypes (e.g., age-related macular degeneration) for misclassification in the disease status due to error-prone or suboptimal entity specific disease classifications when gold-standard classifications are available for a subset ofentities.
Please see the following vignette for an introduction into the usage of the R-package and illustrative examples:
Guenther, F., Brandl, C., Winkler, T. W., Wanner, V., Stark, K., Küchenhoff, H., & Heid, I. M. (2020). Chances and challenges of machine learning based disease classification in genetic association studies illustrated on age-related macular degeneration. Genetic Epidemiology.
Felix.Guenther@stat.uni-muenchen.de
EasyStrata is an R-package that provides advanced funcionality
(i) for the evaluation of stratified GWAS;
(ii) for plotting GWAS results with a specific focus on stratification;
(iii) to simplify data-handling of large-scale GWA data-sets
Version 8.6: EasyStrata_8.6.tar.gz
Command Reference / Manual: EasyStrata_8.6_Commands_140615.pdf
Alternatively, you can access the package via the CRAN R package repository: http://cran.r-project.org/web/packages/EasyStrata/
The following scripts have been developed and can be used for the evaluation of stratified GWAMA results from the Genetic Investigation of ANthropometric Traits (GIANT) consortium.
Scripts:
Plotting scripts:
easystrata_figure1_miami.ecf (Miami-Plot for contrasting two strata)
easystrata_supplfigure3_qqplot.ecf (QQ-Plot of multiple strata)
easystrata_supplfigure4_scatter.ecf (Scatter-Plot of strata-specific effect sizes)
easystrata_supplfigure5_qq_omitreported.ecf (QQ-Plot excluding known loci)
easystrata_supplfigure6_plotspeed.ecf (Increasing plot speed)
easystrata_supplfigure7_break_yaxis.ecf (Breaking up y-axis of Manhattan-plot)
easystrata_supplfigure8_panel.ecf (Panel of QQ and scatter plots)
Evaluation scripts:
easystrata_supplpipe2A_sexdiff.ecf (Difference btw. 2 strata)
easystrata_supplpipe2B_sexdiff_filt.ecf (Difference btw. 2 strata + overall filter)
easystrata_supplpipe2C_joint.ecf (Joint main+interaction effect)
Integrative genome screen script (Winkler et al. NatComm 2018):
integrative_screen.ecf (The integrative screen script requires EasyStrata v18.1 or greater that can be downloaded here: EasyStrata_18.1.tar.gz)
Data:
Example mapping file:
hapmap36.map (Hapmap b36 mapping file: SNPID, Chromosome, Position)
Example locus annotation file:
WAIST_2009_2010_14_reported.txt (Known waist-hip ratio loci, published by Lindgren et al 2009, Heid et al 2010)
R 2.13 or higher. R packages 'Cairo' and 'plotrix'.
If you use EasyStrata please cite
"Winkler et al.: EasyStrata: evaluation and visualization of stratified genome-wide
association meta-analysis data. Bioinformatics 2014"
and (if possible) reference our webpage "www.genepi-regensburg.de/easystrata".
Thank you.
EasyStrata is licensed under the GNU General Public License, version 3.
Copyright © 2012 by Thomas Winkler.
Although we hope that EasyStrata will be very useful, it is published WITHOUT ANY WARRANTY.
If you require support for a different platform or have any further questions please e-mail Thomas Winkler
date of last update: 2018-04-19
EasyQC is an R-package that provides advanced funcionality
(i) to perform file-level QC of single genome-wide association (GWA) data-sets;
(ii) to conduct quality control across several GWA data-sets (meta-level QC);
(iii) to simplify data-handling of large-scale GWA data-sets
One could also say, it can be used as Nonsense-Detector for study-specific GWA data-sets.
Currect Version 23.8: EasyQC_23.8.tar.gz
Previous distributed version: EasyQC_9.2.tar.gz
Manual: EasyQC_9.0_Commands_140918_2.pdf
ChangeLog: EASYQC_CHANGE.log
The following EasyQC ecf-script was used for quality control of 11 Early AMD GWAS prior to meta-analysis (Winkler et al. BMC Medical Genomics 2020). The cleaning script was developed for binary outcome GWAS that were conducted with rvtest:
The script is capable with rvtest output. Further details and instructions on the individual steps are shown as comments in the script.
The following material can be used for quality control of 1000 Genomes or HRC imputed GWAS result data sets.
Scripts:
This script can be used with the below 1000G or HRC reference files and incorporates different QC steps such as Sanity Checks, Filtering, Allele coding harmonization, Marker harmonization, Allele frequency checks, QQ plots, etc. In particular the allele coding and the marker harmoniization are inevitable steps prior to meta-analysis.
Allele frequency reference data (all based on NCBI build 37):
The provided allele frequency reference files are using the cptid format for marker identifiers. The cptid format is automatically generated by the EasyQC function CREATECPTID. Please see the EasyQC manual for more detailed information on the format.
Allele frequency reference data for 1000G phase1 version3 imputed GWAS (based on allele frequencies given in the "legend" files from the IMPUTE website):
Excluding X-Chr variants:
allelefreq.1000G_EUR_p1v3.impute_legends.noDup.noX.gz
allelefreq.1000G_AFR_p1v3.impute_legends.noDup.noX.gz
allelefreq.1000G_AMR_p1v3.impute_legends.noDup.noX.gz
allelefreq.1000G_ASN_p1v3.impute_legends.noDup.noX.gz
Excluding X-Chr variants, excluding monomorphic variants:
allelefreq.1000G_EUR_p1v3.impute_legends.noMono.noDup.noX.v2.gz
allelefreq.1000G_AFR_p1v3.impute_legends.noMono.noDup.noX.v2.gz
allelefreq.1000G_AMR_p1v3.impute_legends.noMono.noDup.noX.v2.gz
allelefreq.1000G_ASN_p1v3.impute_legends.noMono.noDup.noX.v2.gz
Allele frequency reference data for 1000G phase3 version5 imputed GWAS (based on allele frequencies given in the "legend" files from the IMPUTE website):
Excluding monomorphic variants, excluding CNVs:
1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.ALL.txt.gz
1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.EUR.txt.gz
1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.AFR.txt.gz
1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.AMR.txt.gz
1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.EAS.txt.gz
1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.SAS.txt.gz
Allele freq. reference data for Haplotype Reference Consortium (HRC) imputed GWAS (based on allele frequencies given in the reference file provided by Will Rayner, http://www.well.ox.ac.uk/~wrayner/tools/#Checking):
Excluding variants with mac<5 or maf<0.1%
HRC.r1-1.GRCh37.wgs.mac5.sites.tab.cptid.maf001.gz
Mapping files (all based on NCBI build 37):
The mapping files contain information about chromosome and position for various contained marker identifiers (e.g., rsIDs) that do not contain the chromosomal and position information within the marker name (e.g., "chr1:123:AT_A"). The files are based on imputation reference files from the MACH and the IMPUTE websites. It can be used with the EasyQC function CREATECPTID that allows for harmonization of marker names across studies by compiling unique cptid's. Please see the EasyQC manual for more detailled information on the cptid format.
Mapping file for 1000G phase1 version3 imputed GWAS:
rsmid_map.1000G_ALL_p1v3.merged_mach_impute.v3.mergeindels.txt.gz
Mapping file for 1000G phase3 version5 imputed GWAS:
rsmid_machsvs_mapb37.1000G_p3v5.merged_mach_impute.v3.corrpos.gz
Mapping file for HRC imputed GWAS:
HRC.r1-1.GRCh37.wgs.mac5.sites.tab.rsid_map.gz
Change log for mapping files: CHANGE_map.log
The following material has been used for quality control and for several projects of the Genetic Investigation of ANthropometric Traits (GIANT) consortium.
Scripts:
File-level QC scripts:
1_filelevel_qc.gwa.ecf (for HapMap imputed data)
1_filelevel_qc.metabochip.ecf (for genotyped Metabochip data)
Meta-level QC script:
Meta-Analysis script (to be used with metal):
Meta-Analysis QC scripts
4_metaanalysis_qc.compare_logfiles.r (R-script)
4_metaanalysis_qc.studymeta.ecf
Reference data:
Allele frequency reference data (based on NCBI build 36):
AlleleFreq_HapMap_CEU.v2.txt.gz (for CEU HapMap imputed data)
AlleleFreq_1000G_EUR_Metabochip.v1.txt.gz (for CEU genotyped Metabochip data)
Marker harmonization reference data (based on NCBI build 36):
SNPID_to_ChrPosID.b36_v2.txt.gz
QT interval SNPs reference data (based on NCBI build 36):
Please see our QC paper "Winkler et al.: Quality control and conduct of genome-wide association meta-analyses. Nature Protocols 2014" for further details regarding this scripts and material.
Scripts:
Cleaning scripts for Rvtests output:
clean_rvtests.ecf (for Rvtets association output)
clean_rvtests_cov.ecf (for Rvtets *Cov* output)
Cleaning scripts for Raremetalworker output:
clean_raremetalworker.ecf (for Raremetalworker association output)
clean_raremetalworker_cov.ecf (for Raremetalworker *cov* output)
Reference data:
Exomechip Allele frequency reference data:
R 2.13 or higher.
Only UNIX/LINUX systems are supported.
If you use EasyQC please cite
"Winkler et al.: Quality control and conduct of genome-wide association meta-analyses. Nature Protocols 2014"
and (if possible) reference our webpage "www.genepi-regensburg.de/easyqc".
Thank you.
EasyQC is licensed under the GNU General Public License, version 3.
Copyright © 2012 by Thomas Winkler.
Although we hope that EasyQC will be very useful, it is published WITHOUT ANY WARRANTY.
If you require support for a different platform or have any further questions please e-mail Thomas Winkler
date of last update: 2017-02-20
idGenerator provides an automated tool to generate identifiers (IDs) with multiple features, particularly for modern epidemiological or clinical studies. The software enables the generation of structured IDs to facilitate study organization, layered IDs to enhance data protection, and check digits to detect entry errors. It is easy to utilize due to a user-friendly graphic user interface and practical by providing IDs as standard text and 128B barcode. idGenerator addresses towards small to medium epidemiologic or clinical studies in need of a simple yet secure concept and tool for ID creation management. The software may be used by study personnel without programming training on a standard Windows computer.
Download:
If you require support for a different platform or have any further questions please e-mail Matthias Olden.
date of last update: 2021-02-22
Here you can find the scripts for the parallel processing imputation pipeline along with a detailled description.
Description:
MetaMega_pipeline_parallel_phasing_imputing_v2.pdf
Scripts:
01_phasing.pbs
02_imputing.pbs
03_generate_phasing_pbs_scripts.R
04_generate_imputing_pbs_scripts.R
05_submit.sh
Please contact mathias.gorski@ukr.de if you have questions or problems running the pipeline.