Annotations for Top Association

Load annotation information for switchgrass

The function pvdiv_table_topsnps creates dataframes containing annotation information for the switchgrass diversity panel. To use this function, first load the provided annotation information from wherever you have saved it (currently, this is version 5.1 of the annotation information for Panicum virgatum):

library(AnnotationDbi)
library(VariantAnnotation)
txdb <- loadDb(file = "Pvirgatum_516_v5.1.gene.txdb.sqlite")

Load SNP dataset and GWAS results

We will use the same toy SNP dataset that we used for genome-wide association, and find annotations for the same phenotype, GWAS_CT.

library(bigsnpr)
#> Loading required package: bigstatsr

snpfile <- system.file("extdata", "example_bigsnp.rds", package = "switchgrassGWAS")
gwasfile <- system.file("extdata", "GWAS_datatable_example_GWAS_CT.rds", 
                        package = "switchgrassGWAS")
                        
gwas <- readRDS(gwasfile)
snp <- snp_attach(snpfile)

Then, specify the number of top SNPs you want to find annotation information for, a FDR threshold to find annotation information for, and a set of distances away from the associated SNP (in bp) for which to pull annotations. Here, we find genes 1kb and 20kb away from 1) top 10 associations and 2) associations above a FDR of 10%.

anno_tables <- pvdiv_table_topsnps(df = gwas, type = "bigsnp", n = 10, 
                                  FDRalpha = 0.1, rangevector = c(1000, 20000),
                                  snp = snp, txdb = txdb)

Let’s look at the annotations for genes within 1000bp of SNPs above the 10% FDR cutoff. In our toy example, there are seven of these:

kable(anno_tables[[2]])
CHR region_start region_end width region_strand Annotation QUERYID TXID Gene ID POS SNP Effect SNP standard error bigsnpscore p value log10p FDR Adjusted p value gene_start gene_end gene_width gene_strand pacId transcriptName peptideName Pfam Panther Categories KOG ec KO GO Arabidopsis thaliana homolog A. thaliana gene name A. thaliana gene annotation Rice homolog Rice gene name Rice gene annotation d2_start d2_end distance from gene
Chr01N 34181686 34182686 1001 * intergenic 1 NA NA 34182186 0.9888194 0.2562594 3.858665 0.0001261 3.899256 0.0756650 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Chr02K 68243567 68244567 1001 * intergenic 2 NA NA 68244067 -0.9049649 0.2237918 -4.043781 0.0000593 4.226830 0.0533842 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Chr03N 31494893 31495893 1001 * intergenic 3 NA NA 31495393 1.2009167 0.2758415 4.353647 0.0000157 4.804299 0.0209074 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Chr05K 49122041 49123041 1001 - promoter 4 66970 Pavir.5KG551800 49122541 -1.1140440 0.2251390 -4.948250 0.0000010 6.013622 0.0034888 49122303 49122857 555 - 41544150 Pavir.5KG551800.1 Pavir.5KG551800.1.p PF01167 PTHR16517,PTHR16517:SF32 NA NA NA NA AT1G53320.1 AtTLP7,TLP7 tubby like protein 7 LOC_Os05g43850.1 NA OsFBT9 - F-box and tubby domain containing protein, expressed 238 -316 238
Chr07N 34540063 34541063 1001 - promoter 5 96920 Pavir.7NG229717 34540563 -0.7609417 0.1757385 -4.329965 0.0000174 4.758882 0.0209074 34539192 34539791 600 - 41627406 Pavir.7NG229717.1 Pavir.7NG229717.1.p NA NA NA NA NA NA NA NA NA ChrUn.fgenesh.mRNA.22 NA expressed protein 1371 772 772
Chr07N 34540063 34541063 1001 * intergenic 5 NA NA 34540563 -0.7609417 0.1757385 -4.329965 0.0000174 4.758882 0.0209074 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Chr08K 4378182 4379182 1001 * intergenic 6 NA NA 4378682 -0.6237182 0.1604175 -3.888093 0.0001121 3.950457 0.0756650 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

pvdiv_table_topsnps will return a list of all of the tables you requested, named according to the criteria used to create the table. You can then save these tables individually as csv or any other type of file.

If the resultant tables are small, say, less than 2000 entries apiece, I favor saving these tables as individual sheets in an Excel file using the R package XLConnect.

library(XLConnect)

wb1 <- loadWorkbook(filename = "Annotation_tables_GWAS_CT.xlsx", 
                    create = TRUE)
for(j in seq_along(anno_tables)){
  createSheet(wb1, name = names(anno_tables)[j])
  writeWorksheet(wb1, anno_tables[[j]], sheet = names(anno_tables)[j])
  }
saveWorkbook(wb1)