This function is a wrapper around the standard GWAS procedures in the Juenger lab. Singular value decomposition of the SNPs is done to get principal components for population structure correction; the 'best' number of PCs is chosen as the one that makes lambda_GC, the Genomic Control coefficient, closest to 1. (See the lambdagc parameter to set this yourself.) Next, genome-wide association is conducted, and the GWAS output can be saved, as well as Manhattan plots, QQ-plots, and annotation information for the top SNPs for each phenotype.

pvdiv_standard_gwas(
  snp,
  df = switchgrassGWAS::pvdiv_phenotypes,
  type = c("linear", "logistic"),
  ncores = nb_cores(),
  outputdir = ".",
  covar = NULL,
  lambdagc = TRUE,
  savegwas = FALSE,
  savetype = c("rds", "fbm", "both"),
  suffix = "",
  saveplots = TRUE,
  saveannos = FALSE,
  txdb = NULL,
  minphe = 200,
  ...
)

Arguments

snp

A "bigSNP" object; load with bigsnpr::snp_attach(). Here, genomic information for Panicum virgatum. SNP data is available at doi:10.18738/T8/ET9UAU

df

Dataframe of phenotypes where the first column is PLANT_ID.

type

Character string. Type of univarate regression to run for GWAS. Options are "linear" or "logistic".

ncores

Number of cores to use. Default is one.

outputdir

String or file.path() to the output directory. Default is the working directory.

covar

Optional covariance matrix to include in the regression. You can generate these using pvdiv_autoSVD().

lambdagc

Default is TRUE - should lambda_GC be used to find the best population structure correction? Alternatively, you can provide a data frame containing "NumPCs" and the phenotype names containing lambda_GC values. This is saved to the output directory by pvdiv_standard_gwas and saved or generated by pvdiv_lambda_GC.

savegwas

Logical. Should the gwas output be saved to the working directory? These files are typically quite large. Default is FALSE.

savetype

Character string. Type of GWAS save file. Options are 'rds', which saves individual rds files for each GWAS; 'fbm', which saves one filebacked big matrix (using the bigsnpr package), or 'both', which saves both file types. These files are typically quite large.

suffix

Optional character vector to give saved files a unique search string/name.

saveplots

Logical. Should Manhattan and QQ-plots be generated and saved to the working directory? Default is TRUE.

saveannos

Logical. Should annotation tables for top SNPs be generated and saved to the working directory? Default is FALSE. Can take additional arguments; requires a txdb.sqlite object used in AnnotationDbi.

txdb

A txdb object such as 'Pvirgatum_516_v5.1.gene.txdb.sqlite'. Load this into your environment with AnnotationDbi::loadDb.

minphe

Integer. What's the minimum number of phenotyped individuals to conduct a GWAS on? Default is 200. Use lower values with caution.

...

Other arguments to pvdiv_lambda_GC or pvdiv_table_topsnps.

Value

A big_SVD object.

Examples

if (FALSE) {
# Here we specify that we do want to generate and save the gwas dataframes,
# the Manhattan and QQ-plots, and the annotation tables.
pvdiv_standard_gwas(snp, df = pvdiv_phenotypes, type = "linear", covar = svd,
    ncores = nb_cores(), lambdagc = TRUE, savegwas = TRUE, saveplots = TRUE,
    saveannos = TRUE, txdb = txdb)
}
# In this example, we run GWAS on all the phenotypes in pvdiv_phenotypes
# using an example SNP set of ~1800 SNPs.
snpfile <- system.file("extdata", "example_bigsnp.rds", package = "switchgrassGWAS")
library(bigsnpr)
snp <- snp_attach(snpfile)
pvdiv_standard_gwas(snp, df = pvdiv_phenotypes, type = "linear", savegwas = FALSE,
    saveplots = FALSE, ncores = 1)
#> 'lambdagc' is TRUE, so lambda_GC will be used to find the best population structure correction using the covariance matrix.
#> 'savegwas' is FALSE, so the gwas results will not be saved to disk.
#> Covariance matrix (covar) was not supplied - this will be generated using pvdiv_autoSVD().
#> 'saveoutput' is FALSE, so the svd will not be saved to the working directory.
#> Now starting GWAS pipeline for GWAS_CT.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for GWAS_CT using 0 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 1 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 2 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 3 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 4 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 5 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 6 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 7 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 8 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 9 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 10 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 11 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 12 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 13 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 14 PCs.
#> Finished Lambda_GC calculation for GWAS_CT using 15 PCs.
#> Finished phenotype 1: GWAS_CT
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for MAT.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for MAT using 0 PCs.
#> Finished Lambda_GC calculation for MAT using 1 PCs.
#> Finished Lambda_GC calculation for MAT using 2 PCs.
#> Finished Lambda_GC calculation for MAT using 3 PCs.
#> Finished Lambda_GC calculation for MAT using 4 PCs.
#> Finished Lambda_GC calculation for MAT using 5 PCs.
#> Finished Lambda_GC calculation for MAT using 6 PCs.
#> Finished Lambda_GC calculation for MAT using 7 PCs.
#> Finished Lambda_GC calculation for MAT using 8 PCs.
#> Finished Lambda_GC calculation for MAT using 9 PCs.
#> Finished Lambda_GC calculation for MAT using 10 PCs.
#> Finished Lambda_GC calculation for MAT using 11 PCs.
#> Finished Lambda_GC calculation for MAT using 12 PCs.
#> Finished Lambda_GC calculation for MAT using 13 PCs.
#> Finished Lambda_GC calculation for MAT using 14 PCs.
#> Finished Lambda_GC calculation for MAT using 15 PCs.
#> Finished phenotype 1: MAT
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for bio17.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for bio17 using 0 PCs.
#> Finished Lambda_GC calculation for bio17 using 1 PCs.
#> Finished Lambda_GC calculation for bio17 using 2 PCs.
#> Finished Lambda_GC calculation for bio17 using 3 PCs.
#> Finished Lambda_GC calculation for bio17 using 4 PCs.
#> Finished Lambda_GC calculation for bio17 using 5 PCs.
#> Finished Lambda_GC calculation for bio17 using 6 PCs.
#> Finished Lambda_GC calculation for bio17 using 7 PCs.
#> Finished Lambda_GC calculation for bio17 using 8 PCs.
#> Finished Lambda_GC calculation for bio17 using 9 PCs.
#> Finished Lambda_GC calculation for bio17 using 10 PCs.
#> Finished Lambda_GC calculation for bio17 using 11 PCs.
#> Finished Lambda_GC calculation for bio17 using 12 PCs.
#> Finished Lambda_GC calculation for bio17 using 13 PCs.
#> Finished Lambda_GC calculation for bio17 using 14 PCs.
#> Finished Lambda_GC calculation for bio17 using 15 PCs.
#> Finished phenotype 1: bio17
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for bio4.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for bio4 using 0 PCs.
#> Finished Lambda_GC calculation for bio4 using 1 PCs.
#> Finished Lambda_GC calculation for bio4 using 2 PCs.
#> Finished Lambda_GC calculation for bio4 using 3 PCs.
#> Finished Lambda_GC calculation for bio4 using 4 PCs.
#> Finished Lambda_GC calculation for bio4 using 5 PCs.
#> Finished Lambda_GC calculation for bio4 using 6 PCs.
#> Finished Lambda_GC calculation for bio4 using 7 PCs.
#> Finished Lambda_GC calculation for bio4 using 8 PCs.
#> Finished Lambda_GC calculation for bio4 using 9 PCs.
#> Finished Lambda_GC calculation for bio4 using 10 PCs.
#> Finished Lambda_GC calculation for bio4 using 11 PCs.
#> Finished Lambda_GC calculation for bio4 using 12 PCs.
#> Finished Lambda_GC calculation for bio4 using 13 PCs.
#> Finished Lambda_GC calculation for bio4 using 14 PCs.
#> Finished Lambda_GC calculation for bio4 using 15 PCs.
#> Finished phenotype 1: bio4
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for bio16.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for bio16 using 0 PCs.
#> Finished Lambda_GC calculation for bio16 using 1 PCs.
#> Finished Lambda_GC calculation for bio16 using 2 PCs.
#> Finished Lambda_GC calculation for bio16 using 3 PCs.
#> Finished Lambda_GC calculation for bio16 using 4 PCs.
#> Finished Lambda_GC calculation for bio16 using 5 PCs.
#> Finished Lambda_GC calculation for bio16 using 6 PCs.
#> Finished Lambda_GC calculation for bio16 using 7 PCs.
#> Finished Lambda_GC calculation for bio16 using 8 PCs.
#> Finished Lambda_GC calculation for bio16 using 9 PCs.
#> Finished Lambda_GC calculation for bio16 using 10 PCs.
#> Finished Lambda_GC calculation for bio16 using 11 PCs.
#> Finished Lambda_GC calculation for bio16 using 12 PCs.
#> Finished Lambda_GC calculation for bio16 using 13 PCs.
#> Finished Lambda_GC calculation for bio16 using 14 PCs.
#> Finished Lambda_GC calculation for bio16 using 15 PCs.
#> Finished phenotype 1: bio16
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for AHM.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for AHM using 0 PCs.
#> Finished Lambda_GC calculation for AHM using 1 PCs.
#> Finished Lambda_GC calculation for AHM using 2 PCs.
#> Finished Lambda_GC calculation for AHM using 3 PCs.
#> Finished Lambda_GC calculation for AHM using 4 PCs.
#> Finished Lambda_GC calculation for AHM using 5 PCs.
#> Finished Lambda_GC calculation for AHM using 6 PCs.
#> Finished Lambda_GC calculation for AHM using 7 PCs.
#> Finished Lambda_GC calculation for AHM using 8 PCs.
#> Finished Lambda_GC calculation for AHM using 9 PCs.
#> Finished Lambda_GC calculation for AHM using 10 PCs.
#> Finished Lambda_GC calculation for AHM using 11 PCs.
#> Finished Lambda_GC calculation for AHM using 12 PCs.
#> Finished Lambda_GC calculation for AHM using 13 PCs.
#> Finished Lambda_GC calculation for AHM using 14 PCs.
#> Finished Lambda_GC calculation for AHM using 15 PCs.
#> Finished phenotype 1: AHM
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for bio2.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for bio2 using 0 PCs.
#> Finished Lambda_GC calculation for bio2 using 1 PCs.
#> Finished Lambda_GC calculation for bio2 using 2 PCs.
#> Finished Lambda_GC calculation for bio2 using 3 PCs.
#> Finished Lambda_GC calculation for bio2 using 4 PCs.
#> Finished Lambda_GC calculation for bio2 using 5 PCs.
#> Finished Lambda_GC calculation for bio2 using 6 PCs.
#> Finished Lambda_GC calculation for bio2 using 7 PCs.
#> Finished Lambda_GC calculation for bio2 using 8 PCs.
#> Finished Lambda_GC calculation for bio2 using 9 PCs.
#> Finished Lambda_GC calculation for bio2 using 10 PCs.
#> Finished Lambda_GC calculation for bio2 using 11 PCs.
#> Finished Lambda_GC calculation for bio2 using 12 PCs.
#> Finished Lambda_GC calculation for bio2 using 13 PCs.
#> Finished Lambda_GC calculation for bio2 using 14 PCs.
#> Finished Lambda_GC calculation for bio2 using 15 PCs.
#> Finished phenotype 1: bio2
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for bio5.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for bio5 using 0 PCs.
#> Finished Lambda_GC calculation for bio5 using 1 PCs.
#> Finished Lambda_GC calculation for bio5 using 2 PCs.
#> Finished Lambda_GC calculation for bio5 using 3 PCs.
#> Finished Lambda_GC calculation for bio5 using 4 PCs.
#> Finished Lambda_GC calculation for bio5 using 5 PCs.
#> Finished Lambda_GC calculation for bio5 using 6 PCs.
#> Finished Lambda_GC calculation for bio5 using 7 PCs.
#> Finished Lambda_GC calculation for bio5 using 8 PCs.
#> Finished Lambda_GC calculation for bio5 using 9 PCs.
#> Finished Lambda_GC calculation for bio5 using 10 PCs.
#> Finished Lambda_GC calculation for bio5 using 11 PCs.
#> Finished Lambda_GC calculation for bio5 using 12 PCs.
#> Finished Lambda_GC calculation for bio5 using 13 PCs.
#> Finished Lambda_GC calculation for bio5 using 14 PCs.
#> Finished Lambda_GC calculation for bio5 using 15 PCs.
#> Finished phenotype 1: bio5
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for FRAC_SRV_THREE.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 0 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 1 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 2 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 3 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 4 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 5 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 6 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 7 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 8 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 9 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 10 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 11 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 12 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 13 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 14 PCs.
#> Finished Lambda_GC calculation for FRAC_SRV_THREE using 15 PCs.
#> Finished phenotype 1: FRAC_SRV_THREE
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for CLMB_BIOMASS.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 0 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 1 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 2 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 3 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 4 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 5 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 6 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 7 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 8 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 9 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 10 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 11 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 12 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 13 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 14 PCs.
#> Finished Lambda_GC calculation for CLMB_BIOMASS using 15 PCs.
#> Finished phenotype 1: CLMB_BIOMASS
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for KBSM_BIOMASS.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 0 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 1 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 2 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 3 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 4 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 5 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 6 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 7 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 8 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 9 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 10 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 11 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 12 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 13 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 14 PCs.
#> Finished Lambda_GC calculation for KBSM_BIOMASS using 15 PCs.
#> Finished phenotype 1: KBSM_BIOMASS
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."
#> Now starting GWAS pipeline for PKLE_BIOMASS.
#> Now determining lambda_GC for GWAS models with 16 sets of PCs. This will take some time.
#> saveoutput is FALSE, so lambda_GC values won't be saved to a csv.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 0 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 1 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 2 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 3 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 4 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 5 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 6 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 7 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 8 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 9 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 10 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 11 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 12 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 13 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 14 PCs.
#> Finished Lambda_GC calculation for PKLE_BIOMASS using 15 PCs.
#> Finished phenotype 1: PKLE_BIOMASS
#> Now running GWAS with the best population structure correction.
#> [1] "saveoutput is FALSE so GWAS object will not be saved to disk."